diff --git a/docs/_posts/ahmedlone127/2024-09-04-sent_bert_base_spanish_wwm_uncased_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-04-sent_bert_base_spanish_wwm_uncased_pipeline_es.md new file mode 100644 index 00000000000000..31f9b1e49628ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-04-sent_bert_base_spanish_wwm_uncased_pipeline_es.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Castilian, Spanish sent_bert_base_spanish_wwm_uncased_pipeline pipeline BertSentenceEmbeddings from dccuchile +author: John Snow Labs +name: sent_bert_base_spanish_wwm_uncased_pipeline +date: 2024-09-04 +tags: [es, open_source, pipeline, onnx] +task: Embeddings +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_spanish_wwm_uncased_pipeline` is a Castilian, Spanish model originally trained by dccuchile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_spanish_wwm_uncased_pipeline_es_5.5.0_3.0_1725415961164.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_spanish_wwm_uncased_pipeline_es_5.5.0_3.0_1725415961164.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_spanish_wwm_uncased_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_spanish_wwm_uncased_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_spanish_wwm_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|410.2 MB| + +## References + +https://huggingface.co/dccuchile/bert-base-spanish-wwm-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-05-qa_synthetic_data_only_finetuned_v1_0_en.md b/docs/_posts/ahmedlone127/2024-09-05-qa_synthetic_data_only_finetuned_v1_0_en.md new file mode 100644 index 00000000000000..9e3d2f0b42603e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-05-qa_synthetic_data_only_finetuned_v1_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English qa_synthetic_data_only_finetuned_v1_0 XlmRoBertaForQuestionAnswering from am-infoweb +author: John Snow Labs +name: qa_synthetic_data_only_finetuned_v1_0 +date: 2024-09-05 +tags: [en, open_source, onnx, question_answering, xlm_roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_synthetic_data_only_finetuned_v1_0` is a English model originally trained by am-infoweb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_synthetic_data_only_finetuned_v1_0_en_5.5.0_3.0_1725557042495.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_synthetic_data_only_finetuned_v1_0_en_5.5.0_3.0_1725557042495.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = XlmRoBertaForQuestionAnswering.pretrained("qa_synthetic_data_only_finetuned_v1_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = XlmRoBertaForQuestionAnswering.pretrained("qa_synthetic_data_only_finetuned_v1_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_synthetic_data_only_finetuned_v1_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|803.6 MB| + +## References + +https://huggingface.co/am-infoweb/QA_SYNTHETIC_DATA_ONLY_Finetuned_v1.0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-06-claim_extraction_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-06-claim_extraction_classifier_en.md new file mode 100644 index 00000000000000..e58eec7e15b188 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-06-claim_extraction_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English claim_extraction_classifier DeBertaForSequenceClassification from KnutJaegersberg +author: John Snow Labs +name: claim_extraction_classifier +date: 2024-09-06 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`claim_extraction_classifier` is a English model originally trained by KnutJaegersberg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/claim_extraction_classifier_en_5.5.0_3.0_1725611976387.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/claim_extraction_classifier_en_5.5.0_3.0_1725611976387.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("claim_extraction_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("claim_extraction_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|claim_extraction_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/KnutJaegersberg/claim_extraction_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-bert_wnut_token_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-07-bert_wnut_token_classifier_en.md new file mode 100644 index 00000000000000..e7c15210e33ccb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-bert_wnut_token_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_wnut_token_classifier DistilBertForTokenClassification from ZappY-AI +author: John Snow Labs +name: bert_wnut_token_classifier +date: 2024-09-07 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_wnut_token_classifier` is a English model originally trained by ZappY-AI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_wnut_token_classifier_en_5.5.0_3.0_1725729872514.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_wnut_token_classifier_en_5.5.0_3.0_1725729872514.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("bert_wnut_token_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("bert_wnut_token_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_wnut_token_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/ZappY-AI/bert-wnut-token-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-bsc_bio_ehr_spanish_combined_train_drugtemist_dev_85_ner_en.md b/docs/_posts/ahmedlone127/2024-09-07-bsc_bio_ehr_spanish_combined_train_drugtemist_dev_85_ner_en.md new file mode 100644 index 00000000000000..af787ab0c74b20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-bsc_bio_ehr_spanish_combined_train_drugtemist_dev_85_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bsc_bio_ehr_spanish_combined_train_drugtemist_dev_85_ner RoBertaForTokenClassification from Rodrigo1771 +author: John Snow Labs +name: bsc_bio_ehr_spanish_combined_train_drugtemist_dev_85_ner +date: 2024-09-07 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_combined_train_drugtemist_dev_85_ner` is a English model originally trained by Rodrigo1771. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_combined_train_drugtemist_dev_85_ner_en_5.5.0_3.0_1725720999381.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_combined_train_drugtemist_dev_85_ner_en_5.5.0_3.0_1725720999381.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_combined_train_drugtemist_dev_85_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_combined_train_drugtemist_dev_85_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_combined_train_drugtemist_dev_85_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|440.6 MB| + +## References + +https://huggingface.co/Rodrigo1771/bsc-bio-ehr-es-combined-train-drugtemist-dev-85-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_qa_model_gekyume_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_qa_model_gekyume_pipeline_en.md new file mode 100644 index 00000000000000..3481b3fd176c9a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_qa_model_gekyume_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_gekyume_pipeline pipeline DistilBertForQuestionAnswering from Gekyume +author: John Snow Labs +name: burmese_awesome_qa_model_gekyume_pipeline +date: 2024-09-07 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_gekyume_pipeline` is a English model originally trained by Gekyume. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_gekyume_pipeline_en_5.5.0_3.0_1725746264861.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_gekyume_pipeline_en_5.5.0_3.0_1725746264861.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_gekyume_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_gekyume_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_gekyume_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Gekyume/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-r_fb_sms_lm_en.md b/docs/_posts/ahmedlone127/2024-09-07-r_fb_sms_lm_en.md new file mode 100644 index 00000000000000..38397793a00f06 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-r_fb_sms_lm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English r_fb_sms_lm RoBertaEmbeddings from adnankhawaja +author: John Snow Labs +name: r_fb_sms_lm +date: 2024-09-07 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`r_fb_sms_lm` is a English model originally trained by adnankhawaja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/r_fb_sms_lm_en_5.5.0_3.0_1725678505363.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/r_fb_sms_lm_en_5.5.0_3.0_1725678505363.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("r_fb_sms_lm","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("r_fb_sms_lm","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|r_fb_sms_lm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/adnankhawaja/R_FB_SMS_LM \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-burmese_awesome_wnut_jaoo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-burmese_awesome_wnut_jaoo_pipeline_en.md new file mode 100644 index 00000000000000..36e45090684cc3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-burmese_awesome_wnut_jaoo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_wnut_jaoo_pipeline pipeline DistilBertForTokenClassification from gonzalezrostani +author: John Snow Labs +name: burmese_awesome_wnut_jaoo_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_jaoo_pipeline` is a English model originally trained by gonzalezrostani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_jaoo_pipeline_en_5.5.0_3.0_1725837720041.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_jaoo_pipeline_en_5.5.0_3.0_1725837720041.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_wnut_jaoo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_wnut_jaoo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_jaoo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/gonzalezrostani/my_awesome_wnut_JAOo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_finetuned_squad_taeseon_en.md b/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_finetuned_squad_taeseon_en.md new file mode 100644 index 00000000000000..f3e6d2a2940cd6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_finetuned_squad_taeseon_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_taeseon DistilBertForQuestionAnswering from taeseon +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_taeseon +date: 2024-09-08 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_taeseon` is a English model originally trained by taeseon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_taeseon_en_5.5.0_3.0_1725818616750.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_taeseon_en_5.5.0_3.0_1725818616750.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_taeseon","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_taeseon", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_taeseon| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/taeseon/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-mpnet_base_all_nli_triplet_korruz_en.md b/docs/_posts/ahmedlone127/2024-09-08-mpnet_base_all_nli_triplet_korruz_en.md new file mode 100644 index 00000000000000..b6ae5c8f138dd4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-mpnet_base_all_nli_triplet_korruz_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English mpnet_base_all_nli_triplet_korruz MPNetEmbeddings from korruz +author: John Snow Labs +name: mpnet_base_all_nli_triplet_korruz +date: 2024-09-08 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mpnet_base_all_nli_triplet_korruz` is a English model originally trained by korruz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mpnet_base_all_nli_triplet_korruz_en_5.5.0_3.0_1725816094251.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mpnet_base_all_nli_triplet_korruz_en_5.5.0_3.0_1725816094251.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("mpnet_base_all_nli_triplet_korruz","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("mpnet_base_all_nli_triplet_korruz","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mpnet_base_all_nli_triplet_korruz| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|390.1 MB| + +## References + +https://huggingface.co/korruz/mpnet-base-all-nli-triplet \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_pbb_v2_en.md b/docs/_posts/ahmedlone127/2024-09-08-opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_pbb_v2_en.md new file mode 100644 index 00000000000000..ea3eac44a045b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_pbb_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_pbb_v2 MarianTransformer from mekjr1 +author: John Snow Labs +name: opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_pbb_v2 +date: 2024-09-08 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_pbb_v2` is a English model originally trained by mekjr1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_pbb_v2_en_5.5.0_3.0_1725824135212.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_pbb_v2_en_5.5.0_3.0_1725824135212.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_pbb_v2","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_pbb_v2","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_pbb_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|539.9 MB| + +## References + +https://huggingface.co/mekjr1/opus-mt-en-es-finetuned-es-to-pbb-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-spark_name_armenian_tonga_tonga_islands_english_en.md b/docs/_posts/ahmedlone127/2024-09-08-spark_name_armenian_tonga_tonga_islands_english_en.md new file mode 100644 index 00000000000000..579b43a1ff4a31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-spark_name_armenian_tonga_tonga_islands_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English spark_name_armenian_tonga_tonga_islands_english MarianTransformer from ihebaker10 +author: John Snow Labs +name: spark_name_armenian_tonga_tonga_islands_english +date: 2024-09-08 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spark_name_armenian_tonga_tonga_islands_english` is a English model originally trained by ihebaker10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spark_name_armenian_tonga_tonga_islands_english_en_5.5.0_3.0_1725824974673.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spark_name_armenian_tonga_tonga_islands_english_en_5.5.0_3.0_1725824974673.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("spark_name_armenian_tonga_tonga_islands_english","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("spark_name_armenian_tonga_tonga_islands_english","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spark_name_armenian_tonga_tonga_islands_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|523.5 MB| + +## References + +https://huggingface.co/ihebaker10/spark-name-hy-to-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-spotify_podcast_advertising_classification_en.md b/docs/_posts/ahmedlone127/2024-09-08-spotify_podcast_advertising_classification_en.md new file mode 100644 index 00000000000000..2aaa9e72ac4c35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-spotify_podcast_advertising_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English spotify_podcast_advertising_classification BertForSequenceClassification from morenolq +author: John Snow Labs +name: spotify_podcast_advertising_classification +date: 2024-09-08 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spotify_podcast_advertising_classification` is a English model originally trained by morenolq. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spotify_podcast_advertising_classification_en_5.5.0_3.0_1725761516783.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spotify_podcast_advertising_classification_en_5.5.0_3.0_1725761516783.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("spotify_podcast_advertising_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("spotify_podcast_advertising_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spotify_podcast_advertising_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/morenolq/spotify-podcast-advertising-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-token_classification_test_sahithiankireddy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-token_classification_test_sahithiankireddy_pipeline_en.md new file mode 100644 index 00000000000000..caf79bc8dbc68a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-token_classification_test_sahithiankireddy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English token_classification_test_sahithiankireddy_pipeline pipeline DistilBertForTokenClassification from sahithiankireddy +author: John Snow Labs +name: token_classification_test_sahithiankireddy_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`token_classification_test_sahithiankireddy_pipeline` is a English model originally trained by sahithiankireddy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/token_classification_test_sahithiankireddy_pipeline_en_5.5.0_3.0_1725788965117.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/token_classification_test_sahithiankireddy_pipeline_en_5.5.0_3.0_1725788965117.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("token_classification_test_sahithiankireddy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("token_classification_test_sahithiankireddy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|token_classification_test_sahithiankireddy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.4 MB| + +## References + +https://huggingface.co/sahithiankireddy/token_classification_test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-xtremedistil_l6_h256_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-xtremedistil_l6_h256_uncased_pipeline_en.md new file mode 100644 index 00000000000000..7974915042c028 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-xtremedistil_l6_h256_uncased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xtremedistil_l6_h256_uncased_pipeline pipeline BertForSequenceClassification from microsoft +author: John Snow Labs +name: xtremedistil_l6_h256_uncased_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xtremedistil_l6_h256_uncased_pipeline` is a English model originally trained by microsoft. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xtremedistil_l6_h256_uncased_pipeline_en_5.5.0_3.0_1725801955061.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xtremedistil_l6_h256_uncased_pipeline_en_5.5.0_3.0_1725801955061.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xtremedistil_l6_h256_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xtremedistil_l6_h256_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xtremedistil_l6_h256_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|47.3 MB| + +## References + +https://huggingface.co/microsoft/xtremedistil-l6-h256-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-005_microsoft_deberta_v3_base_finetuned_yahoo_140k_en.md b/docs/_posts/ahmedlone127/2024-09-09-005_microsoft_deberta_v3_base_finetuned_yahoo_140k_en.md new file mode 100644 index 00000000000000..97e0b0ef250ac0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-005_microsoft_deberta_v3_base_finetuned_yahoo_140k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 005_microsoft_deberta_v3_base_finetuned_yahoo_140k DeBertaForSequenceClassification from diogopaes10 +author: John Snow Labs +name: 005_microsoft_deberta_v3_base_finetuned_yahoo_140k +date: 2024-09-09 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`005_microsoft_deberta_v3_base_finetuned_yahoo_140k` is a English model originally trained by diogopaes10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/005_microsoft_deberta_v3_base_finetuned_yahoo_140k_en_5.5.0_3.0_1725879622346.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/005_microsoft_deberta_v3_base_finetuned_yahoo_140k_en_5.5.0_3.0_1725879622346.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("005_microsoft_deberta_v3_base_finetuned_yahoo_140k","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("005_microsoft_deberta_v3_base_finetuned_yahoo_140k", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|005_microsoft_deberta_v3_base_finetuned_yahoo_140k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|659.6 MB| + +## References + +https://huggingface.co/diogopaes10/005-microsoft-deberta-v3-base-finetuned-yahoo-140k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-burmese_awesome_qa_model_acezxn_en.md b/docs/_posts/ahmedlone127/2024-09-09-burmese_awesome_qa_model_acezxn_en.md new file mode 100644 index 00000000000000..65f34a73b57608 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-burmese_awesome_qa_model_acezxn_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_acezxn DistilBertForQuestionAnswering from acezxn +author: John Snow Labs +name: burmese_awesome_qa_model_acezxn +date: 2024-09-09 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_acezxn` is a English model originally trained by acezxn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_acezxn_en_5.5.0_3.0_1725876818400.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_acezxn_en_5.5.0_3.0_1725876818400.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_acezxn","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_acezxn", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_acezxn| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/acezxn/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_cased_concept_extraction_wikipedia_v1_0_concept_extraction_indoiranian_languages_v1_0_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_cased_concept_extraction_wikipedia_v1_0_concept_extraction_indoiranian_languages_v1_0_en.md new file mode 100644 index 00000000000000..c450e340570308 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_cased_concept_extraction_wikipedia_v1_0_concept_extraction_indoiranian_languages_v1_0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_cased_concept_extraction_wikipedia_v1_0_concept_extraction_indoiranian_languages_v1_0 DistilBertForTokenClassification from HungChau +author: John Snow Labs +name: distilbert_base_cased_concept_extraction_wikipedia_v1_0_concept_extraction_indoiranian_languages_v1_0 +date: 2024-09-09 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_concept_extraction_wikipedia_v1_0_concept_extraction_indoiranian_languages_v1_0` is a English model originally trained by HungChau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_concept_extraction_wikipedia_v1_0_concept_extraction_indoiranian_languages_v1_0_en_5.5.0_3.0_1725889772953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_concept_extraction_wikipedia_v1_0_concept_extraction_indoiranian_languages_v1_0_en_5.5.0_3.0_1725889772953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_base_cased_concept_extraction_wikipedia_v1_0_concept_extraction_indoiranian_languages_v1_0","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_base_cased_concept_extraction_wikipedia_v1_0_concept_extraction_indoiranian_languages_v1_0", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_concept_extraction_wikipedia_v1_0_concept_extraction_indoiranian_languages_v1_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/HungChau/distilbert-base-cased-concept-extraction-wikipedia-v1.0-concept-extraction-iir-v1.0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_distilled_squad_bilstm_finetuned_squad_pure2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_distilled_squad_bilstm_finetuned_squad_pure2_pipeline_en.md new file mode 100644 index 00000000000000..751586d1519802 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_distilled_squad_bilstm_finetuned_squad_pure2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_squad_bilstm_finetuned_squad_pure2_pipeline pipeline DistilBertForQuestionAnswering from allistair99 +author: John Snow Labs +name: distilbert_base_uncased_distilled_squad_bilstm_finetuned_squad_pure2_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_squad_bilstm_finetuned_squad_pure2_pipeline` is a English model originally trained by allistair99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_squad_bilstm_finetuned_squad_pure2_pipeline_en_5.5.0_3.0_1725892236084.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_squad_bilstm_finetuned_squad_pure2_pipeline_en_5.5.0_3.0_1725892236084.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_distilled_squad_bilstm_finetuned_squad_pure2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_distilled_squad_bilstm_finetuned_squad_pure2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_squad_bilstm_finetuned_squad_pure2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/allistair99/distilbert-base-uncased-distilled-squad-BiLSTM-finetuned-squad-pure2 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_ruidanwang_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_ruidanwang_en.md new file mode 100644 index 00000000000000..d724c3cb88526c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_ruidanwang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_imdb_ruidanwang DistilBertEmbeddings from ruidanwang +author: John Snow Labs +name: distilbert_base_uncased_finetuned_imdb_ruidanwang +date: 2024-09-09 +tags: [en, open_source, onnx, embeddings, distilbert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_imdb_ruidanwang` is a English model originally trained by ruidanwang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_ruidanwang_en_5.5.0_3.0_1725868264059.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_ruidanwang_en_5.5.0_3.0_1725868264059.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DistilBertEmbeddings.pretrained("distilbert_base_uncased_finetuned_imdb_ruidanwang","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = DistilBertEmbeddings.pretrained("distilbert_base_uncased_finetuned_imdb_ruidanwang","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_imdb_ruidanwang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[distilbert]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/ruidanwang/distilbert-base-uncased-finetuned-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_victorbarra_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_victorbarra_pipeline_en.md new file mode 100644 index 00000000000000..396f929da2fc11 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-distilbert_base_uncased_finetuned_imdb_victorbarra_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_imdb_victorbarra_pipeline pipeline DistilBertEmbeddings from victorbarra +author: John Snow Labs +name: distilbert_base_uncased_finetuned_imdb_victorbarra_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_imdb_victorbarra_pipeline` is a English model originally trained by victorbarra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_victorbarra_pipeline_en_5.5.0_3.0_1725905588381.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_victorbarra_pipeline_en_5.5.0_3.0_1725905588381.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_victorbarra_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_victorbarra_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_imdb_victorbarra_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/victorbarra/distilbert-base-uncased-finetuned-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-dummy_model_verfallen_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-dummy_model_verfallen_pipeline_en.md new file mode 100644 index 00000000000000..99bba134d1a75d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-dummy_model_verfallen_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dummy_model_verfallen_pipeline pipeline CamemBertEmbeddings from verfallen +author: John Snow Labs +name: dummy_model_verfallen_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CamemBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dummy_model_verfallen_pipeline` is a English model originally trained by verfallen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dummy_model_verfallen_pipeline_en_5.5.0_3.0_1725851295777.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dummy_model_verfallen_pipeline_en_5.5.0_3.0_1725851295777.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dummy_model_verfallen_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dummy_model_verfallen_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dummy_model_verfallen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|264.0 MB| + +## References + +https://huggingface.co/verfallen/dummy-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- CamemBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-finetuning_amaskedlanguage_en.md b/docs/_posts/ahmedlone127/2024-09-09-finetuning_amaskedlanguage_en.md new file mode 100644 index 00000000000000..2ae369fe554f5c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-finetuning_amaskedlanguage_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_amaskedlanguage DistilBertEmbeddings from DeveloperAya +author: John Snow Labs +name: finetuning_amaskedlanguage +date: 2024-09-09 +tags: [en, open_source, onnx, embeddings, distilbert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_amaskedlanguage` is a English model originally trained by DeveloperAya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_amaskedlanguage_en_5.5.0_3.0_1725921617036.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_amaskedlanguage_en_5.5.0_3.0_1725921617036.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DistilBertEmbeddings.pretrained("finetuning_amaskedlanguage","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = DistilBertEmbeddings.pretrained("finetuning_amaskedlanguage","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_amaskedlanguage| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[distilbert]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/DeveloperAya/FineTuning_aMaskedLanguage \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-maltese_hitz_catalan_basque_pipeline_ca.md b/docs/_posts/ahmedlone127/2024-09-09-maltese_hitz_catalan_basque_pipeline_ca.md new file mode 100644 index 00000000000000..71d6ca30ad095a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-maltese_hitz_catalan_basque_pipeline_ca.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Catalan, Valencian maltese_hitz_catalan_basque_pipeline pipeline MarianTransformer from HiTZ +author: John Snow Labs +name: maltese_hitz_catalan_basque_pipeline +date: 2024-09-09 +tags: [ca, open_source, pipeline, onnx] +task: Translation +language: ca +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`maltese_hitz_catalan_basque_pipeline` is a Catalan, Valencian model originally trained by HiTZ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/maltese_hitz_catalan_basque_pipeline_ca_5.5.0_3.0_1725891980977.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/maltese_hitz_catalan_basque_pipeline_ca_5.5.0_3.0_1725891980977.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("maltese_hitz_catalan_basque_pipeline", lang = "ca") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("maltese_hitz_catalan_basque_pipeline", lang = "ca") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|maltese_hitz_catalan_basque_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ca| +|Size:|225.8 MB| + +## References + +https://huggingface.co/HiTZ/mt-hitz-ca-eu + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-mpnet_base_natural_questions_mnrl_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-mpnet_base_natural_questions_mnrl_pipeline_en.md new file mode 100644 index 00000000000000..dd213a16f74551 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-mpnet_base_natural_questions_mnrl_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English mpnet_base_natural_questions_mnrl_pipeline pipeline MPNetEmbeddings from tomaarsen +author: John Snow Labs +name: mpnet_base_natural_questions_mnrl_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mpnet_base_natural_questions_mnrl_pipeline` is a English model originally trained by tomaarsen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mpnet_base_natural_questions_mnrl_pipeline_en_5.5.0_3.0_1725897150565.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mpnet_base_natural_questions_mnrl_pipeline_en_5.5.0_3.0_1725897150565.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mpnet_base_natural_questions_mnrl_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mpnet_base_natural_questions_mnrl_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mpnet_base_natural_questions_mnrl_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.6 MB| + +## References + +https://huggingface.co/tomaarsen/mpnet-base-natural-questions-mnrl + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_russian_finetuned_english_tonga_tonga_islands_russian_qwerty22tau_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_russian_finetuned_english_tonga_tonga_islands_russian_qwerty22tau_pipeline_en.md new file mode 100644 index 00000000000000..ffbae5660d7a91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_english_russian_finetuned_english_tonga_tonga_islands_russian_qwerty22tau_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_russian_finetuned_english_tonga_tonga_islands_russian_qwerty22tau_pipeline pipeline MarianTransformer from qwerty22tau +author: John Snow Labs +name: opus_maltese_english_russian_finetuned_english_tonga_tonga_islands_russian_qwerty22tau_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_russian_finetuned_english_tonga_tonga_islands_russian_qwerty22tau_pipeline` is a English model originally trained by qwerty22tau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_russian_finetuned_english_tonga_tonga_islands_russian_qwerty22tau_pipeline_en_5.5.0_3.0_1725891613581.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_russian_finetuned_english_tonga_tonga_islands_russian_qwerty22tau_pipeline_en_5.5.0_3.0_1725891613581.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_russian_finetuned_english_tonga_tonga_islands_russian_qwerty22tau_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_russian_finetuned_english_tonga_tonga_islands_russian_qwerty22tau_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_russian_finetuned_english_tonga_tonga_islands_russian_qwerty22tau_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|526.0 MB| + +## References + +https://huggingface.co/qwerty22tau/opus-mt-en-ru-finetuned-en-to-ru + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_ft_v3_3_epochs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_ft_v3_3_epochs_pipeline_en.md new file mode 100644 index 00000000000000..e88cd70d528f0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_ft_v3_3_epochs_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_ft_v3_3_epochs_pipeline pipeline MarianTransformer from abdiharyadi +author: John Snow Labs +name: opus_maltese_ft_v3_3_epochs_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_ft_v3_3_epochs_pipeline` is a English model originally trained by abdiharyadi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_ft_v3_3_epochs_pipeline_en_5.5.0_3.0_1725840182886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_ft_v3_3_epochs_pipeline_en_5.5.0_3.0_1725840182886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_ft_v3_3_epochs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_ft_v3_3_epochs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_ft_v3_3_epochs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|482.7 MB| + +## References + +https://huggingface.co/abdiharyadi/opus-mt-ft-v3-3-epochs + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_korean_english_en.md b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_korean_english_en.md new file mode 100644 index 00000000000000..dec8e8f27be5d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_korean_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_korean_english MarianTransformer from seohyun-choi +author: John Snow Labs +name: opus_maltese_korean_english +date: 2024-09-09 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_korean_english` is a English model originally trained by seohyun-choi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_korean_english_en_5.5.0_3.0_1725865364983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_korean_english_en_5.5.0_3.0_1725865364983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_korean_english","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_korean_english","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_korean_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|540.6 MB| + +## References + +https://huggingface.co/seohyun-choi/opus-mt-ko-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_walloon_english_finetuned_npomo_english_15_epochs_en.md b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_walloon_english_finetuned_npomo_english_15_epochs_en.md new file mode 100644 index 00000000000000..a3cd8ab2f9b1df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-opus_maltese_walloon_english_finetuned_npomo_english_15_epochs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_walloon_english_finetuned_npomo_english_15_epochs MarianTransformer from UnassumingOwl +author: John Snow Labs +name: opus_maltese_walloon_english_finetuned_npomo_english_15_epochs +date: 2024-09-09 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_walloon_english_finetuned_npomo_english_15_epochs` is a English model originally trained by UnassumingOwl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_walloon_english_finetuned_npomo_english_15_epochs_en_5.5.0_3.0_1725864279280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_walloon_english_finetuned_npomo_english_15_epochs_en_5.5.0_3.0_1725864279280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_walloon_english_finetuned_npomo_english_15_epochs","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_walloon_english_finetuned_npomo_english_15_epochs","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_walloon_english_finetuned_npomo_english_15_epochs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|506.2 MB| + +## References + +https://huggingface.co/UnassumingOwl/opus-mt-wa-en-finetuned-npomo-en-15-epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-refpydst_5p_icdst_split_v3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-refpydst_5p_icdst_split_v3_pipeline_en.md new file mode 100644 index 00000000000000..3075c3170105f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-refpydst_5p_icdst_split_v3_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English refpydst_5p_icdst_split_v3_pipeline pipeline MPNetEmbeddings from Brendan +author: John Snow Labs +name: refpydst_5p_icdst_split_v3_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`refpydst_5p_icdst_split_v3_pipeline` is a English model originally trained by Brendan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/refpydst_5p_icdst_split_v3_pipeline_en_5.5.0_3.0_1725896436202.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/refpydst_5p_icdst_split_v3_pipeline_en_5.5.0_3.0_1725896436202.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("refpydst_5p_icdst_split_v3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("refpydst_5p_icdst_split_v3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|refpydst_5p_icdst_split_v3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/Brendan/refpydst-5p-icdst-split-v3 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-roberta_base_corener_en.md b/docs/_posts/ahmedlone127/2024-09-09-roberta_base_corener_en.md new file mode 100644 index 00000000000000..1409a870eadedf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-roberta_base_corener_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_corener RoBertaEmbeddings from aiola +author: John Snow Labs +name: roberta_base_corener +date: 2024-09-09 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_corener` is a English model originally trained by aiola. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_corener_en_5.5.0_3.0_1725910000235.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_corener_en_5.5.0_3.0_1725910000235.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_corener","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_corener","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_corener| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/aiola/roberta-base-corener \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-translate_model_v3_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-translate_model_v3_2_pipeline_en.md new file mode 100644 index 00000000000000..11aa91d02f262b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-translate_model_v3_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English translate_model_v3_2_pipeline pipeline MarianTransformer from gshields +author: John Snow Labs +name: translate_model_v3_2_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`translate_model_v3_2_pipeline` is a English model originally trained by gshields. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/translate_model_v3_2_pipeline_en_5.5.0_3.0_1725891557120.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/translate_model_v3_2_pipeline_en_5.5.0_3.0_1725891557120.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("translate_model_v3_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("translate_model_v3_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|translate_model_v3_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|523.5 MB| + +## References + +https://huggingface.co/gshields/translate_model_v3.2 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-bert_gemma_strongoversight_vllm_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-bert_gemma_strongoversight_vllm_0_pipeline_en.md new file mode 100644 index 00000000000000..30180a31d08c55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-bert_gemma_strongoversight_vllm_0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_gemma_strongoversight_vllm_0_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_gemma_strongoversight_vllm_0_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_gemma_strongoversight_vllm_0_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_gemma_strongoversight_vllm_0_pipeline_en_5.5.0_3.0_1726009691198.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_gemma_strongoversight_vllm_0_pipeline_en_5.5.0_3.0_1726009691198.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_gemma_strongoversight_vllm_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_gemma_strongoversight_vllm_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_gemma_strongoversight_vllm_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_gemma-strongOversight-vllm_0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-facets_gpt_expanswer_1234_en.md b/docs/_posts/ahmedlone127/2024-09-10-facets_gpt_expanswer_1234_en.md new file mode 100644 index 00000000000000..d558a044c426ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-facets_gpt_expanswer_1234_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English facets_gpt_expanswer_1234 MPNetEmbeddings from ingeol +author: John Snow Labs +name: facets_gpt_expanswer_1234 +date: 2024-09-10 +tags: [en, open_source, onnx, embeddings, mpnet] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`facets_gpt_expanswer_1234` is a English model originally trained by ingeol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/facets_gpt_expanswer_1234_en_5.5.0_3.0_1725996947387.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/facets_gpt_expanswer_1234_en_5.5.0_3.0_1725996947387.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = MPNetEmbeddings.pretrained("facets_gpt_expanswer_1234","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val embeddings = MPNetEmbeddings.pretrained("facets_gpt_expanswer_1234","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|facets_gpt_expanswer_1234| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[mpnet]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/ingeol/facets_gpt_expanswer_1234 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-gsm_finetunned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-gsm_finetunned_pipeline_en.md new file mode 100644 index 00000000000000..68f75ea481defa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-gsm_finetunned_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English gsm_finetunned_pipeline pipeline MPNetEmbeddings from anomys +author: John Snow Labs +name: gsm_finetunned_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gsm_finetunned_pipeline` is a English model originally trained by anomys. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gsm_finetunned_pipeline_en_5.5.0_3.0_1725978385500.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gsm_finetunned_pipeline_en_5.5.0_3.0_1725978385500.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gsm_finetunned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gsm_finetunned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gsm_finetunned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.7 MB| + +## References + +https://huggingface.co/anomys/gsm-finetunned + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_pipeline_en.md new file mode 100644 index 00000000000000..63fedb8bdc0056 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_pipeline pipeline MPNetEmbeddings from shrinivasbjoshi +author: John Snow Labs +name: r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_pipeline` is a English model originally trained by shrinivasbjoshi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_pipeline_en_5.5.0_3.0_1725963430814.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_pipeline_en_5.5.0_3.0_1725963430814.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|r2_w266_setfit_mbti_multiclass_hypsearch_mpnet_nov30_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/shrinivasbjoshi/r2-w266-setfit-mbti-multiclass-hypsearch-mpnet-nov30 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-roberta_base_roberta_model_enyonam_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-roberta_base_roberta_model_enyonam_pipeline_en.md new file mode 100644 index 00000000000000..65779bf4677f51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-roberta_base_roberta_model_enyonam_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_roberta_model_enyonam_pipeline pipeline RoBertaForSequenceClassification from Enyonam +author: John Snow Labs +name: roberta_base_roberta_model_enyonam_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_roberta_model_enyonam_pipeline` is a English model originally trained by Enyonam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_roberta_model_enyonam_pipeline_en_5.5.0_3.0_1725962633171.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_roberta_model_enyonam_pipeline_en_5.5.0_3.0_1725962633171.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_roberta_model_enyonam_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_roberta_model_enyonam_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_roberta_model_enyonam_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|424.6 MB| + +## References + +https://huggingface.co/Enyonam/roberta-base-Roberta-Model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-setfit_mpnet_base_v2_finetuned_senteval_cree_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-setfit_mpnet_base_v2_finetuned_senteval_cree_pipeline_en.md new file mode 100644 index 00000000000000..b92374a7f80435 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-setfit_mpnet_base_v2_finetuned_senteval_cree_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English setfit_mpnet_base_v2_finetuned_senteval_cree_pipeline pipeline MPNetEmbeddings from mrm8488 +author: John Snow Labs +name: setfit_mpnet_base_v2_finetuned_senteval_cree_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`setfit_mpnet_base_v2_finetuned_senteval_cree_pipeline` is a English model originally trained by mrm8488. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/setfit_mpnet_base_v2_finetuned_senteval_cree_pipeline_en_5.5.0_3.0_1725964274890.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/setfit_mpnet_base_v2_finetuned_senteval_cree_pipeline_en_5.5.0_3.0_1725964274890.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("setfit_mpnet_base_v2_finetuned_senteval_cree_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("setfit_mpnet_base_v2_finetuned_senteval_cree_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|setfit_mpnet_base_v2_finetuned_senteval_cree_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/mrm8488/setfit-mpnet-base-v2-finetuned-sentEval-CR + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-vetbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-vetbert_pipeline_en.md new file mode 100644 index 00000000000000..5b7ff8afa7ce77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-vetbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English vetbert_pipeline pipeline BertEmbeddings from havocy28 +author: John Snow Labs +name: vetbert_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`vetbert_pipeline` is a English model originally trained by havocy28. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/vetbert_pipeline_en_5.5.0_3.0_1725989221341.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/vetbert_pipeline_en_5.5.0_3.0_1725989221341.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("vetbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("vetbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|vetbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|402.8 MB| + +## References + +https://huggingface.co/havocy28/VetBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-xlm_emo_t_milanlproc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-xlm_emo_t_milanlproc_pipeline_en.md new file mode 100644 index 00000000000000..86b71a849e79e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-xlm_emo_t_milanlproc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_emo_t_milanlproc_pipeline pipeline XlmRoBertaForSequenceClassification from MilaNLProc +author: John Snow Labs +name: xlm_emo_t_milanlproc_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_emo_t_milanlproc_pipeline` is a English model originally trained by MilaNLProc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_emo_t_milanlproc_pipeline_en_5.5.0_3.0_1725967907793.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_emo_t_milanlproc_pipeline_en_5.5.0_3.0_1725967907793.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_emo_t_milanlproc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_emo_t_milanlproc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_emo_t_milanlproc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/MilaNLProc/xlm-emo-t + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_all_oscarnav_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_all_oscarnav_pipeline_en.md new file mode 100644 index 00000000000000..f57074adbcfba8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_finetuned_panx_all_oscarnav_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_oscarnav_pipeline pipeline XlmRoBertaForTokenClassification from OscarNav +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_oscarnav_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_oscarnav_pipeline` is a English model originally trained by OscarNav. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_oscarnav_pipeline_en_5.5.0_3.0_1725974778208.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_oscarnav_pipeline_en_5.5.0_3.0_1725974778208.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_oscarnav_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_oscarnav_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_oscarnav_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/OscarNav/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-albert_base_v2weighted_hoax_classifier_definition_en.md b/docs/_posts/ahmedlone127/2024-09-11-albert_base_v2weighted_hoax_classifier_definition_en.md new file mode 100644 index 00000000000000..5a6b1a9b21f84e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-albert_base_v2weighted_hoax_classifier_definition_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English albert_base_v2weighted_hoax_classifier_definition AlbertForSequenceClassification from research-dump +author: John Snow Labs +name: albert_base_v2weighted_hoax_classifier_definition +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, albert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_base_v2weighted_hoax_classifier_definition` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_base_v2weighted_hoax_classifier_definition_en_5.5.0_3.0_1726027198098.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_base_v2weighted_hoax_classifier_definition_en_5.5.0_3.0_1726027198098.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = AlbertForSequenceClassification.pretrained("albert_base_v2weighted_hoax_classifier_definition","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = AlbertForSequenceClassification.pretrained("albert_base_v2weighted_hoax_classifier_definition", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_base_v2weighted_hoax_classifier_definition| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/research-dump/albert-base-v2weighted_hoax_classifier_definition \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-amazon_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-amazon_pipeline_en.md new file mode 100644 index 00000000000000..a5118a6e9b1b81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-amazon_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English amazon_pipeline pipeline DistilBertForSequenceClassification from bl03 +author: John Snow Labs +name: amazon_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amazon_pipeline` is a English model originally trained by bl03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amazon_pipeline_en_5.5.0_3.0_1726017805766.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amazon_pipeline_en_5.5.0_3.0_1726017805766.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("amazon_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("amazon_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amazon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bl03/amazon + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_seed3_finetuned_squad_en.md b/docs/_posts/ahmedlone127/2024-09-11-babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_seed3_finetuned_squad_en.md new file mode 100644 index 00000000000000..37392c2be7ecad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_seed3_finetuned_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_seed3_finetuned_squad RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_seed3_finetuned_squad +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_seed3_finetuned_squad` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_seed3_finetuned_squad_en_5.5.0_3.0_1726039179363.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_seed3_finetuned_squad_en_5.5.0_3.0_1726039179363.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_seed3_finetuned_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_seed3_finetuned_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_wikipedia1_2_5m_wikipedia_french_with_masking_seed3_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|32.0 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa-wikipedia1_2.5M_wikipedia_french-with-Masking-seed3-finetuned-SQuAD \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-best_model_yelp_polarity_16_87_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-best_model_yelp_polarity_16_87_pipeline_en.md new file mode 100644 index 00000000000000..22605a784c6b25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-best_model_yelp_polarity_16_87_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English best_model_yelp_polarity_16_87_pipeline pipeline AlbertForSequenceClassification from simonycl +author: John Snow Labs +name: best_model_yelp_polarity_16_87_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`best_model_yelp_polarity_16_87_pipeline` is a English model originally trained by simonycl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/best_model_yelp_polarity_16_87_pipeline_en_5.5.0_3.0_1726013196169.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/best_model_yelp_polarity_16_87_pipeline_en_5.5.0_3.0_1726013196169.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("best_model_yelp_polarity_16_87_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("best_model_yelp_polarity_16_87_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|best_model_yelp_polarity_16_87_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/simonycl/best_model-yelp_polarity-16-87 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_30len_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_30len_pipeline_en.md new file mode 100644 index 00000000000000..703974ff8b10c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_30len_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_30len_pipeline pipeline RoBertaForQuestionAnswering from yashwan2003 +author: John Snow Labs +name: burmese_awesome_qa_model_30len_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_30len_pipeline` is a English model originally trained by yashwan2003. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_30len_pipeline_en_5.5.0_3.0_1726036604022.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_30len_pipeline_en_5.5.0_3.0_1726036604022.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_30len_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_30len_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_30len_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.6 MB| + +## References + +https://huggingface.co/yashwan2003/my_awesome_qa_model_30len + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-craft_bionlp_roberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-11-craft_bionlp_roberta_base_en.md new file mode 100644 index 00000000000000..fd2937c6660a79 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-craft_bionlp_roberta_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English craft_bionlp_roberta_base RoBertaEmbeddings from abhi1nandy2 +author: John Snow Labs +name: craft_bionlp_roberta_base +date: 2024-09-11 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`craft_bionlp_roberta_base` is a English model originally trained by abhi1nandy2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/craft_bionlp_roberta_base_en_5.5.0_3.0_1726065724659.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/craft_bionlp_roberta_base_en_5.5.0_3.0_1726065724659.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("craft_bionlp_roberta_base","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("craft_bionlp_roberta_base","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|craft_bionlp_roberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/abhi1nandy2/Craft-bionlp-roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-imdbreviews_classification_distilbert_v02_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-imdbreviews_classification_distilbert_v02_pipeline_en.md new file mode 100644 index 00000000000000..27175b76dad47a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-imdbreviews_classification_distilbert_v02_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English imdbreviews_classification_distilbert_v02_pipeline pipeline AlbertForSequenceClassification from maherh +author: John Snow Labs +name: imdbreviews_classification_distilbert_v02_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdbreviews_classification_distilbert_v02_pipeline` is a English model originally trained by maherh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_distilbert_v02_pipeline_en_5.5.0_3.0_1726013077355.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_distilbert_v02_pipeline_en_5.5.0_3.0_1726013077355.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("imdbreviews_classification_distilbert_v02_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("imdbreviews_classification_distilbert_v02_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdbreviews_classification_distilbert_v02_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/maherh/imdbreviews_classification_distilbert_v02 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-mrr_latest_27_7_en.md b/docs/_posts/ahmedlone127/2024-09-11-mrr_latest_27_7_en.md new file mode 100644 index 00000000000000..c0a51dbbd379cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-mrr_latest_27_7_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English mrr_latest_27_7 RoBertaForQuestionAnswering from prajwalJumde +author: John Snow Labs +name: mrr_latest_27_7 +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mrr_latest_27_7` is a English model originally trained by prajwalJumde. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mrr_latest_27_7_en_5.5.0_3.0_1726061948579.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mrr_latest_27_7_en_5.5.0_3.0_1726061948579.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("mrr_latest_27_7","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("mrr_latest_27_7", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mrr_latest_27_7| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|463.9 MB| + +## References + +https://huggingface.co/prajwalJumde/MRR-Latest-27-7 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-opus_base_aon_tfidf_wce_unsampled_en.md b/docs/_posts/ahmedlone127/2024-09-11-opus_base_aon_tfidf_wce_unsampled_en.md new file mode 100644 index 00000000000000..1d72c986b44056 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-opus_base_aon_tfidf_wce_unsampled_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_base_aon_tfidf_wce_unsampled MarianTransformer from ethansimrm +author: John Snow Labs +name: opus_base_aon_tfidf_wce_unsampled +date: 2024-09-11 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_base_aon_tfidf_wce_unsampled` is a English model originally trained by ethansimrm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_base_aon_tfidf_wce_unsampled_en_5.5.0_3.0_1726038498397.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_base_aon_tfidf_wce_unsampled_en_5.5.0_3.0_1726038498397.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_base_aon_tfidf_wce_unsampled","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_base_aon_tfidf_wce_unsampled","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_base_aon_tfidf_wce_unsampled| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.4 MB| + +## References + +https://huggingface.co/ethansimrm/opus_base_AoN_tfidf_wce_unsampled \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_romance_english_finetuned_npomo_english_10_epochs_en.md b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_romance_english_finetuned_npomo_english_10_epochs_en.md new file mode 100644 index 00000000000000..3bc4fff072bec8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-opus_maltese_romance_english_finetuned_npomo_english_10_epochs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_romance_english_finetuned_npomo_english_10_epochs MarianTransformer from UnassumingOwl +author: John Snow Labs +name: opus_maltese_romance_english_finetuned_npomo_english_10_epochs +date: 2024-09-11 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_romance_english_finetuned_npomo_english_10_epochs` is a English model originally trained by UnassumingOwl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_romance_english_finetuned_npomo_english_10_epochs_en_5.5.0_3.0_1726047151112.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_romance_english_finetuned_npomo_english_10_epochs_en_5.5.0_3.0_1726047151112.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_romance_english_finetuned_npomo_english_10_epochs","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_romance_english_finetuned_npomo_english_10_epochs","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_romance_english_finetuned_npomo_english_10_epochs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|538.9 MB| + +## References + +https://huggingface.co/UnassumingOwl/opus-mt-ROMANCE-en-finetuned-npomo-en-10-epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-parallel_roberta_large_en.md b/docs/_posts/ahmedlone127/2024-09-11-parallel_roberta_large_en.md new file mode 100644 index 00000000000000..58b2c6596d039d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-parallel_roberta_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English parallel_roberta_large RoBertaEmbeddings from luffycodes +author: John Snow Labs +name: parallel_roberta_large +date: 2024-09-11 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`parallel_roberta_large` is a English model originally trained by luffycodes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/parallel_roberta_large_en_5.5.0_3.0_1726032112397.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/parallel_roberta_large_en_5.5.0_3.0_1726032112397.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("parallel_roberta_large","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("parallel_roberta_large","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|parallel_roberta_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/luffycodes/parallel-roberta-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-q2e_ep3_1122_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-q2e_ep3_1122_pipeline_en.md new file mode 100644 index 00000000000000..487356e889365c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-q2e_ep3_1122_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English q2e_ep3_1122_pipeline pipeline MPNetEmbeddings from ingeol +author: John Snow Labs +name: q2e_ep3_1122_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`q2e_ep3_1122_pipeline` is a English model originally trained by ingeol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/q2e_ep3_1122_pipeline_en_5.5.0_3.0_1726089062286.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/q2e_ep3_1122_pipeline_en_5.5.0_3.0_1726089062286.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("q2e_ep3_1122_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("q2e_ep3_1122_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|q2e_ep3_1122_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/ingeol/q2e_ep3_1122 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-roberta_finetuned_subjqa_movies_2_quocc_en.md b/docs/_posts/ahmedlone127/2024-09-11-roberta_finetuned_subjqa_movies_2_quocc_en.md new file mode 100644 index 00000000000000..879d7aded89cc1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-roberta_finetuned_subjqa_movies_2_quocc_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_finetuned_subjqa_movies_2_quocc RoBertaForQuestionAnswering from Quocc +author: John Snow Labs +name: roberta_finetuned_subjqa_movies_2_quocc +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_subjqa_movies_2_quocc` is a English model originally trained by Quocc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_quocc_en_5.5.0_3.0_1726055867824.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_quocc_en_5.5.0_3.0_1726055867824.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_finetuned_subjqa_movies_2_quocc","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_finetuned_subjqa_movies_2_quocc", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_subjqa_movies_2_quocc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/Quocc/roberta-finetuned-subjqa-movies_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-roberta_large_few_shot_k_256_finetuned_squad_seed_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-roberta_large_few_shot_k_256_finetuned_squad_seed_0_pipeline_en.md new file mode 100644 index 00000000000000..19bb7d871ea15f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-roberta_large_few_shot_k_256_finetuned_squad_seed_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_large_few_shot_k_256_finetuned_squad_seed_0_pipeline pipeline RoBertaForQuestionAnswering from anas-awadalla +author: John Snow Labs +name: roberta_large_few_shot_k_256_finetuned_squad_seed_0_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_few_shot_k_256_finetuned_squad_seed_0_pipeline` is a English model originally trained by anas-awadalla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_few_shot_k_256_finetuned_squad_seed_0_pipeline_en_5.5.0_3.0_1726039448200.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_few_shot_k_256_finetuned_squad_seed_0_pipeline_en_5.5.0_3.0_1726039448200.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_few_shot_k_256_finetuned_squad_seed_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_few_shot_k_256_finetuned_squad_seed_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_few_shot_k_256_finetuned_squad_seed_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/anas-awadalla/roberta-large-few-shot-k-256-finetuned-squad-seed-0 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-sarcasm_detection_roberta_base_cree_sayula_popoluca_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-sarcasm_detection_roberta_base_cree_sayula_popoluca_pipeline_en.md new file mode 100644 index 00000000000000..7209fee7698384 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-sarcasm_detection_roberta_base_cree_sayula_popoluca_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sarcasm_detection_roberta_base_cree_sayula_popoluca_pipeline pipeline RoBertaForSequenceClassification from jkhan447 +author: John Snow Labs +name: sarcasm_detection_roberta_base_cree_sayula_popoluca_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sarcasm_detection_roberta_base_cree_sayula_popoluca_pipeline` is a English model originally trained by jkhan447. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sarcasm_detection_roberta_base_cree_sayula_popoluca_pipeline_en_5.5.0_3.0_1726062835452.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sarcasm_detection_roberta_base_cree_sayula_popoluca_pipeline_en_5.5.0_3.0_1726062835452.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sarcasm_detection_roberta_base_cree_sayula_popoluca_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sarcasm_detection_roberta_base_cree_sayula_popoluca_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sarcasm_detection_roberta_base_cree_sayula_popoluca_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|428.7 MB| + +## References + +https://huggingface.co/jkhan447/sarcasm-detection-RoBerta-base-CR-POS + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-sentiment_sentiment_small_random3_seed1_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-sentiment_sentiment_small_random3_seed1_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..6d6790c98c0881 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-sentiment_sentiment_small_random3_seed1_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_sentiment_small_random3_seed1_roberta_base_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random3_seed1_roberta_base_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random3_seed1_roberta_base_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed1_roberta_base_pipeline_en_5.5.0_3.0_1726096128184.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed1_roberta_base_pipeline_en_5.5.0_3.0_1726096128184.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_sentiment_small_random3_seed1_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_sentiment_small_random3_seed1_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random3_seed1_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|430.5 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random3_seed1-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-sms_spam_model_v1_1_en.md b/docs/_posts/ahmedlone127/2024-09-11-sms_spam_model_v1_1_en.md new file mode 100644 index 00000000000000..a4613230be536a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-sms_spam_model_v1_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sms_spam_model_v1_1 DistilBertForSequenceClassification from xia0t1an +author: John Snow Labs +name: sms_spam_model_v1_1 +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sms_spam_model_v1_1` is a English model originally trained by xia0t1an. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sms_spam_model_v1_1_en_5.5.0_3.0_1726014372284.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sms_spam_model_v1_1_en_5.5.0_3.0_1726014372284.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sms_spam_model_v1_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sms_spam_model_v1_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sms_spam_model_v1_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/xia0t1an/sms-spam-model-v1_1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_german_french_leotunganh_en.md b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_german_french_leotunganh_en.md new file mode 100644 index 00000000000000..d38524c9903c35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-xlm_roberta_base_finetuned_panx_german_french_leotunganh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_leotunganh XlmRoBertaForTokenClassification from LeoTungAnh +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_leotunganh +date: 2024-09-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_leotunganh` is a English model originally trained by LeoTungAnh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_leotunganh_en_5.5.0_3.0_1726019692484.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_leotunganh_en_5.5.0_3.0_1726019692484.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_leotunganh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_leotunganh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_leotunganh| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|842.4 MB| + +## References + +https://huggingface.co/LeoTungAnh/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_qa_model_question_ans_en.md b/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_qa_model_question_ans_en.md new file mode 100644 index 00000000000000..909e561c2c287a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-burmese_awesome_qa_model_question_ans_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_question_ans DistilBertForQuestionAnswering from wyxwangmed +author: John Snow Labs +name: burmese_awesome_qa_model_question_ans +date: 2024-09-12 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_question_ans` is a English model originally trained by wyxwangmed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_question_ans_en_5.5.0_3.0_1726180689020.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_question_ans_en_5.5.0_3.0_1726180689020.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_question_ans","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_question_ans", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_question_ans| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/wyxwangmed/my_awesome_qa_model_question_ans \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-lab1_random_yimeiyang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-lab1_random_yimeiyang_pipeline_en.md new file mode 100644 index 00000000000000..40d7eb98237d85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-lab1_random_yimeiyang_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lab1_random_yimeiyang_pipeline pipeline MarianTransformer from yimeiyang +author: John Snow Labs +name: lab1_random_yimeiyang_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab1_random_yimeiyang_pipeline` is a English model originally trained by yimeiyang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab1_random_yimeiyang_pipeline_en_5.5.0_3.0_1726161445654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab1_random_yimeiyang_pipeline_en_5.5.0_3.0_1726161445654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lab1_random_yimeiyang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lab1_random_yimeiyang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab1_random_yimeiyang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.8 MB| + +## References + +https://huggingface.co/yimeiyang/lab1_random + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-marefa_maltese_english_arabic_parallel_10k_splitted_euclidean_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-marefa_maltese_english_arabic_parallel_10k_splitted_euclidean_pipeline_en.md new file mode 100644 index 00000000000000..587206c651f70c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-marefa_maltese_english_arabic_parallel_10k_splitted_euclidean_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marefa_maltese_english_arabic_parallel_10k_splitted_euclidean_pipeline pipeline MarianTransformer from HamdanXI +author: John Snow Labs +name: marefa_maltese_english_arabic_parallel_10k_splitted_euclidean_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marefa_maltese_english_arabic_parallel_10k_splitted_euclidean_pipeline` is a English model originally trained by HamdanXI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marefa_maltese_english_arabic_parallel_10k_splitted_euclidean_pipeline_en_5.5.0_3.0_1726126819339.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marefa_maltese_english_arabic_parallel_10k_splitted_euclidean_pipeline_en_5.5.0_3.0_1726126819339.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marefa_maltese_english_arabic_parallel_10k_splitted_euclidean_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marefa_maltese_english_arabic_parallel_10k_splitted_euclidean_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marefa_maltese_english_arabic_parallel_10k_splitted_euclidean_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|528.0 MB| + +## References + +https://huggingface.co/HamdanXI/marefa-mt-en-ar-parallel-10k-splitted-euclidean + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_french_ai_newbie89_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_french_ai_newbie89_pipeline_en.md new file mode 100644 index 00000000000000..616121e1e341c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-marian_finetuned_kde4_english_tonga_tonga_islands_french_ai_newbie89_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_ai_newbie89_pipeline pipeline MarianTransformer from AI-newbie89 +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_ai_newbie89_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_ai_newbie89_pipeline` is a English model originally trained by AI-newbie89. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_ai_newbie89_pipeline_en_5.5.0_3.0_1726126131856.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_ai_newbie89_pipeline_en_5.5.0_3.0_1726126131856.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_ai_newbie89_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_ai_newbie89_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_ai_newbie89_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.9 MB| + +## References + +https://huggingface.co/AI-newbie89/marian-finetuned-kde4-en-to-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_multiple_languages_english_finetuned_npomo_english_15_epochs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_multiple_languages_english_finetuned_npomo_english_15_epochs_pipeline_en.md new file mode 100644 index 00000000000000..9364b57827c196 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-opus_maltese_multiple_languages_english_finetuned_npomo_english_15_epochs_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_multiple_languages_english_finetuned_npomo_english_15_epochs_pipeline pipeline MarianTransformer from UnassumingOwl +author: John Snow Labs +name: opus_maltese_multiple_languages_english_finetuned_npomo_english_15_epochs_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_multiple_languages_english_finetuned_npomo_english_15_epochs_pipeline` is a English model originally trained by UnassumingOwl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_multiple_languages_english_finetuned_npomo_english_15_epochs_pipeline_en_5.5.0_3.0_1726161826051.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_multiple_languages_english_finetuned_npomo_english_15_epochs_pipeline_en_5.5.0_3.0_1726161826051.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_multiple_languages_english_finetuned_npomo_english_15_epochs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_multiple_languages_english_finetuned_npomo_english_15_epochs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_multiple_languages_english_finetuned_npomo_english_15_epochs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|533.3 MB| + +## References + +https://huggingface.co/UnassumingOwl/opus-mt-mul-en-finetuned-npomo-en-15-epochs + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-roberta_base_bne_finetuned_amazon_reviews_multi_anabach_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-roberta_base_bne_finetuned_amazon_reviews_multi_anabach_pipeline_en.md new file mode 100644 index 00000000000000..8ef435243a27de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-roberta_base_bne_finetuned_amazon_reviews_multi_anabach_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_amazon_reviews_multi_anabach_pipeline pipeline RoBertaForSequenceClassification from AnaBach +author: John Snow Labs +name: roberta_base_bne_finetuned_amazon_reviews_multi_anabach_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_amazon_reviews_multi_anabach_pipeline` is a English model originally trained by AnaBach. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_amazon_reviews_multi_anabach_pipeline_en_5.5.0_3.0_1726118239552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_amazon_reviews_multi_anabach_pipeline_en_5.5.0_3.0_1726118239552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_finetuned_amazon_reviews_multi_anabach_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_finetuned_amazon_reviews_multi_anabach_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_amazon_reviews_multi_anabach_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|446.8 MB| + +## References + +https://huggingface.co/AnaBach/roberta-base-bne-finetuned-amazon_reviews_multi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-roberta_base_epoch_12_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-roberta_base_epoch_12_pipeline_en.md new file mode 100644 index 00000000000000..568a0d76fc266b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-roberta_base_epoch_12_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_12_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_12_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_12_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_12_pipeline_en_5.5.0_3.0_1726113253790.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_12_pipeline_en_5.5.0_3.0_1726113253790.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_12_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_12_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_12_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.1 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_12 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-rulebert_v0_2_k0_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-12-rulebert_v0_2_k0_pipeline_it.md new file mode 100644 index 00000000000000..93b55f6ed0e726 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-rulebert_v0_2_k0_pipeline_it.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Italian rulebert_v0_2_k0_pipeline pipeline XlmRoBertaForSequenceClassification from ribesstefano +author: John Snow Labs +name: rulebert_v0_2_k0_pipeline +date: 2024-09-12 +tags: [it, open_source, pipeline, onnx] +task: Text Classification +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rulebert_v0_2_k0_pipeline` is a Italian model originally trained by ribesstefano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rulebert_v0_2_k0_pipeline_it_5.5.0_3.0_1726146446100.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rulebert_v0_2_k0_pipeline_it_5.5.0_3.0_1726146446100.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rulebert_v0_2_k0_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rulebert_v0_2_k0_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rulebert_v0_2_k0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|870.5 MB| + +## References + +https://huggingface.co/ribesstefano/RuleBert-v0.2-k0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-sent_bert_base_german_dbmdz_cased_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-12-sent_bert_base_german_dbmdz_cased_pipeline_de.md new file mode 100644 index 00000000000000..8c4c69bc6304ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-sent_bert_base_german_dbmdz_cased_pipeline_de.md @@ -0,0 +1,71 @@ +--- +layout: model +title: German sent_bert_base_german_dbmdz_cased_pipeline pipeline BertSentenceEmbeddings from google-bert +author: John Snow Labs +name: sent_bert_base_german_dbmdz_cased_pipeline +date: 2024-09-12 +tags: [de, open_source, pipeline, onnx] +task: Embeddings +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_german_dbmdz_cased_pipeline` is a German model originally trained by google-bert. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_german_dbmdz_cased_pipeline_de_5.5.0_3.0_1726177929003.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_german_dbmdz_cased_pipeline_de_5.5.0_3.0_1726177929003.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_german_dbmdz_cased_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_german_dbmdz_cased_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_german_dbmdz_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|410.4 MB| + +## References + +https://huggingface.co/google-bert/bert-base-german-dbmdz-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-sentiment_analysis_distilbert_ghylb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-sentiment_analysis_distilbert_ghylb_pipeline_en.md new file mode 100644 index 00000000000000..975f8491e79706 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-sentiment_analysis_distilbert_ghylb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_distilbert_ghylb_pipeline pipeline DistilBertForSequenceClassification from GhylB +author: John Snow Labs +name: sentiment_analysis_distilbert_ghylb_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_distilbert_ghylb_pipeline` is a English model originally trained by GhylB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_distilbert_ghylb_pipeline_en_5.5.0_3.0_1726100339292.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_distilbert_ghylb_pipeline_en_5.5.0_3.0_1726100339292.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_distilbert_ghylb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_distilbert_ghylb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_distilbert_ghylb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/GhylB/Sentiment_Analysis_DistilBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_french_khadija267_en.md b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_french_khadija267_en.md new file mode 100644 index 00000000000000..a1d1bbafd49611 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-xlm_roberta_base_finetuned_panx_german_french_khadija267_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_khadija267 XlmRoBertaForTokenClassification from khadija267 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_khadija267 +date: 2024-09-12 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_khadija267` is a English model originally trained by khadija267. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_khadija267_en_5.5.0_3.0_1726165176560.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_khadija267_en_5.5.0_3.0_1726165176560.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_khadija267","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_khadija267", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_khadija267| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/khadija267/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-akan_1_ak.md b/docs/_posts/ahmedlone127/2024-09-13-akan_1_ak.md new file mode 100644 index 00000000000000..737401a2920445 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-akan_1_ak.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Akan akan_1 WhisperForCTC from devkyle +author: John Snow Labs +name: akan_1 +date: 2024-09-13 +tags: [ak, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ak +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`akan_1` is a Akan model originally trained by devkyle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/akan_1_ak_5.5.0_3.0_1726252437935.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/akan_1_ak_5.5.0_3.0_1726252437935.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("akan_1","ak") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("akan_1", "ak") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|akan_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ak| +|Size:|389.9 MB| + +## References + +https://huggingface.co/devkyle/Akan-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-babylm_roberta_base_epoch_10_en.md b/docs/_posts/ahmedlone127/2024-09-13-babylm_roberta_base_epoch_10_en.md new file mode 100644 index 00000000000000..989dceabc84bab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-babylm_roberta_base_epoch_10_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English babylm_roberta_base_epoch_10 RoBertaEmbeddings from Raj-Sanjay-Shah +author: John Snow Labs +name: babylm_roberta_base_epoch_10 +date: 2024-09-13 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babylm_roberta_base_epoch_10` is a English model originally trained by Raj-Sanjay-Shah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babylm_roberta_base_epoch_10_en_5.5.0_3.0_1726197290444.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babylm_roberta_base_epoch_10_en_5.5.0_3.0_1726197290444.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("babylm_roberta_base_epoch_10","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("babylm_roberta_base_epoch_10","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babylm_roberta_base_epoch_10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.6 MB| + +## References + +https://huggingface.co/Raj-Sanjay-Shah/babyLM_roberta_base_epoch_10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-bleurt_large_512_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-bleurt_large_512_pipeline_en.md new file mode 100644 index 00000000000000..4c5ffabd52e2ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-bleurt_large_512_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bleurt_large_512_pipeline pipeline BertForSequenceClassification from Elron +author: John Snow Labs +name: bleurt_large_512_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bleurt_large_512_pipeline` is a English model originally trained by Elron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bleurt_large_512_pipeline_en_5.5.0_3.0_1726201595866.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bleurt_large_512_pipeline_en_5.5.0_3.0_1726201595866.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bleurt_large_512_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bleurt_large_512_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bleurt_large_512_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Elron/bleurt-large-512 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-burmese_awesome_qa_model_maniack_en.md b/docs/_posts/ahmedlone127/2024-09-13-burmese_awesome_qa_model_maniack_en.md new file mode 100644 index 00000000000000..3d658bfb6d41e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-burmese_awesome_qa_model_maniack_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_maniack DistilBertForQuestionAnswering from maniack +author: John Snow Labs +name: burmese_awesome_qa_model_maniack +date: 2024-09-13 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_maniack` is a English model originally trained by maniack. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_maniack_en_5.5.0_3.0_1726245512994.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_maniack_en_5.5.0_3.0_1726245512994.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_maniack","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_maniack", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_maniack| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/maniack/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-deberta_v3_base_civil_comments_wilds_10k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-deberta_v3_base_civil_comments_wilds_10k_pipeline_en.md new file mode 100644 index 00000000000000..778db89bfa6705 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-deberta_v3_base_civil_comments_wilds_10k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English deberta_v3_base_civil_comments_wilds_10k_pipeline pipeline DeBertaForSequenceClassification from shlomihod +author: John Snow Labs +name: deberta_v3_base_civil_comments_wilds_10k_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_base_civil_comments_wilds_10k_pipeline` is a English model originally trained by shlomihod. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_base_civil_comments_wilds_10k_pipeline_en_5.5.0_3.0_1726270920386.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_base_civil_comments_wilds_10k_pipeline_en_5.5.0_3.0_1726270920386.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deberta_v3_base_civil_comments_wilds_10k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deberta_v3_base_civil_comments_wilds_10k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_base_civil_comments_wilds_10k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|623.8 MB| + +## References + +https://huggingface.co/shlomihod/deberta-v3-base-civil-comments-wilds-10k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-distilbert_cased_reviews_v1_en.md b/docs/_posts/ahmedlone127/2024-09-13-distilbert_cased_reviews_v1_en.md new file mode 100644 index 00000000000000..9ac6f0165db0f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-distilbert_cased_reviews_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_cased_reviews_v1 DistilBertForSequenceClassification from Asteriks +author: John Snow Labs +name: distilbert_cased_reviews_v1 +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_cased_reviews_v1` is a English model originally trained by Asteriks. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_cased_reviews_v1_en_5.5.0_3.0_1726242772995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_cased_reviews_v1_en_5.5.0_3.0_1726242772995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_cased_reviews_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_cased_reviews_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_cased_reviews_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.1 MB| + +## References + +https://huggingface.co/Asteriks/distilbert-cased-reviews-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-gqa_roberta_german_legal_squad_part_augmented_17_de.md b/docs/_posts/ahmedlone127/2024-09-13-gqa_roberta_german_legal_squad_part_augmented_17_de.md new file mode 100644 index 00000000000000..58fcee133cbf36 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-gqa_roberta_german_legal_squad_part_augmented_17_de.md @@ -0,0 +1,86 @@ +--- +layout: model +title: German gqa_roberta_german_legal_squad_part_augmented_17 RoBertaForQuestionAnswering from farid1088 +author: John Snow Labs +name: gqa_roberta_german_legal_squad_part_augmented_17 +date: 2024-09-13 +tags: [de, open_source, onnx, question_answering, roberta] +task: Question Answering +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gqa_roberta_german_legal_squad_part_augmented_17` is a German model originally trained by farid1088. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gqa_roberta_german_legal_squad_part_augmented_17_de_5.5.0_3.0_1726199051540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gqa_roberta_german_legal_squad_part_augmented_17_de_5.5.0_3.0_1726199051540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("gqa_roberta_german_legal_squad_part_augmented_17","de") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("gqa_roberta_german_legal_squad_part_augmented_17", "de") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gqa_roberta_german_legal_squad_part_augmented_17| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|de| +|Size:|465.8 MB| + +## References + +https://huggingface.co/farid1088/GQA_RoBERTa_German_legal_SQuAD_part_augmented_17 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-mach_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-mach_2_pipeline_en.md new file mode 100644 index 00000000000000..0a03136f488c5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-mach_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mach_2_pipeline pipeline RoBertaForSequenceClassification from BaronSch +author: John Snow Labs +name: mach_2_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mach_2_pipeline` is a English model originally trained by BaronSch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mach_2_pipeline_en_5.5.0_3.0_1726187610389.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mach_2_pipeline_en_5.5.0_3.0_1726187610389.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mach_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mach_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mach_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/BaronSch/Mach_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-pubchem10m_smiles_bpe_396_250_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-pubchem10m_smiles_bpe_396_250_pipeline_en.md new file mode 100644 index 00000000000000..20d3452ca75243 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-pubchem10m_smiles_bpe_396_250_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English pubchem10m_smiles_bpe_396_250_pipeline pipeline RoBertaEmbeddings from seyonec +author: John Snow Labs +name: pubchem10m_smiles_bpe_396_250_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pubchem10m_smiles_bpe_396_250_pipeline` is a English model originally trained by seyonec. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pubchem10m_smiles_bpe_396_250_pipeline_en_5.5.0_3.0_1726185785260.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pubchem10m_smiles_bpe_396_250_pipeline_en_5.5.0_3.0_1726185785260.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pubchem10m_smiles_bpe_396_250_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pubchem10m_smiles_bpe_396_250_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pubchem10m_smiles_bpe_396_250_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.0 MB| + +## References + +https://huggingface.co/seyonec/PubChem10M_SMILES_BPE_396_250 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-sent_bertimbaulaw_base_portuguese_cased_en.md b/docs/_posts/ahmedlone127/2024-09-13-sent_bertimbaulaw_base_portuguese_cased_en.md new file mode 100644 index 00000000000000..e5b741f217732c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-sent_bertimbaulaw_base_portuguese_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bertimbaulaw_base_portuguese_cased BertSentenceEmbeddings from alfaneo +author: John Snow Labs +name: sent_bertimbaulaw_base_portuguese_cased +date: 2024-09-13 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertimbaulaw_base_portuguese_cased` is a English model originally trained by alfaneo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertimbaulaw_base_portuguese_cased_en_5.5.0_3.0_1726223712526.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertimbaulaw_base_portuguese_cased_en_5.5.0_3.0_1726223712526.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bertimbaulaw_base_portuguese_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bertimbaulaw_base_portuguese_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertimbaulaw_base_portuguese_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|405.8 MB| + +## References + +https://huggingface.co/alfaneo/bertimbaulaw-base-portuguese-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-sent_rubiobert_en.md b/docs/_posts/ahmedlone127/2024-09-13-sent_rubiobert_en.md new file mode 100644 index 00000000000000..7895e1f86f829f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-sent_rubiobert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_rubiobert BertSentenceEmbeddings from alexyalunin +author: John Snow Labs +name: sent_rubiobert +date: 2024-09-13 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_rubiobert` is a English model originally trained by alexyalunin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_rubiobert_en_5.5.0_3.0_1726246224036.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_rubiobert_en_5.5.0_3.0_1726246224036.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_rubiobert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_rubiobert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_rubiobert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|667.1 MB| + +## References + +https://huggingface.co/alexyalunin/RuBioBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-socbert_final_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-socbert_final_pipeline_en.md new file mode 100644 index 00000000000000..1e81597938a2da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-socbert_final_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English socbert_final_pipeline pipeline RoBertaEmbeddings from sarkerlab +author: John Snow Labs +name: socbert_final_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`socbert_final_pipeline` is a English model originally trained by sarkerlab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/socbert_final_pipeline_en_5.5.0_3.0_1726197533621.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/socbert_final_pipeline_en_5.5.0_3.0_1726197533621.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("socbert_final_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("socbert_final_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|socbert_final_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|534.6 MB| + +## References + +https://huggingface.co/sarkerlab/SocBERT-final + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-test_model_yzhangqs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-test_model_yzhangqs_pipeline_en.md new file mode 100644 index 00000000000000..d0b8c76cc3022a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-test_model_yzhangqs_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_model_yzhangqs_pipeline pipeline DistilBertForSequenceClassification from yzhangqs +author: John Snow Labs +name: test_model_yzhangqs_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_model_yzhangqs_pipeline` is a English model originally trained by yzhangqs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_model_yzhangqs_pipeline_en_5.5.0_3.0_1726262021604.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_model_yzhangqs_pipeline_en_5.5.0_3.0_1726262021604.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_model_yzhangqs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_model_yzhangqs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_model_yzhangqs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yzhangqs/Test_Model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-whisper_small_hindi_sanchit_gandhi_hi.md b/docs/_posts/ahmedlone127/2024-09-13-whisper_small_hindi_sanchit_gandhi_hi.md new file mode 100644 index 00000000000000..ce4cae34fd565d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-whisper_small_hindi_sanchit_gandhi_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_small_hindi_sanchit_gandhi WhisperForCTC from sanchit-gandhi +author: John Snow Labs +name: whisper_small_hindi_sanchit_gandhi +date: 2024-09-13 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_sanchit_gandhi` is a Hindi model originally trained by sanchit-gandhi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_sanchit_gandhi_hi_5.5.0_3.0_1726257360072.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_sanchit_gandhi_hi_5.5.0_3.0_1726257360072.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_sanchit_gandhi","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_sanchit_gandhi", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_sanchit_gandhi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/sanchit-gandhi/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-bert_turkish_turkish_movie_reviews_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-14-bert_turkish_turkish_movie_reviews_pipeline_tr.md new file mode 100644 index 00000000000000..1b51278c5a838e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-bert_turkish_turkish_movie_reviews_pipeline_tr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Turkish bert_turkish_turkish_movie_reviews_pipeline pipeline BertForSequenceClassification from anilguven +author: John Snow Labs +name: bert_turkish_turkish_movie_reviews_pipeline +date: 2024-09-14 +tags: [tr, open_source, pipeline, onnx] +task: Text Classification +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_turkish_turkish_movie_reviews_pipeline` is a Turkish model originally trained by anilguven. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_turkish_turkish_movie_reviews_pipeline_tr_5.5.0_3.0_1726347651521.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_turkish_turkish_movie_reviews_pipeline_tr_5.5.0_3.0_1726347651521.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_turkish_turkish_movie_reviews_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_turkish_turkish_movie_reviews_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_turkish_turkish_movie_reviews_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|414.5 MB| + +## References + +https://huggingface.co/anilguven/bert_tr_turkish_movie_reviews + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-bsc_bio_ehr_spanish_livingner3_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-14-bsc_bio_ehr_spanish_livingner3_pipeline_es.md new file mode 100644 index 00000000000000..ae0b151944122a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-bsc_bio_ehr_spanish_livingner3_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_livingner3_pipeline pipeline RoBertaForSequenceClassification from IIC +author: John Snow Labs +name: bsc_bio_ehr_spanish_livingner3_pipeline +date: 2024-09-14 +tags: [es, open_source, pipeline, onnx] +task: Text Classification +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_livingner3_pipeline` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_livingner3_pipeline_es_5.5.0_3.0_1726272655991.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_livingner3_pipeline_es_5.5.0_3.0_1726272655991.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bsc_bio_ehr_spanish_livingner3_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bsc_bio_ehr_spanish_livingner3_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_livingner3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|435.4 MB| + +## References + +https://huggingface.co/IIC/bsc-bio-ehr-es-livingner3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_qa_model_uday1998_en.md b/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_qa_model_uday1998_en.md new file mode 100644 index 00000000000000..b96662d1f8e491 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-burmese_awesome_qa_model_uday1998_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_uday1998 DistilBertForQuestionAnswering from Uday1998 +author: John Snow Labs +name: burmese_awesome_qa_model_uday1998 +date: 2024-09-14 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_uday1998` is a English model originally trained by Uday1998. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_uday1998_en_5.5.0_3.0_1726335910492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_uday1998_en_5.5.0_3.0_1726335910492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_uday1998","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_uday1998", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_uday1998| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Uday1998/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-ct_m2_complete_en.md b/docs/_posts/ahmedlone127/2024-09-14-ct_m2_complete_en.md new file mode 100644 index 00000000000000..fd0dc0348bf862 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-ct_m2_complete_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ct_m2_complete RoBertaEmbeddings from crisistransformers +author: John Snow Labs +name: ct_m2_complete +date: 2024-09-14 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ct_m2_complete` is a English model originally trained by crisistransformers. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ct_m2_complete_en_5.5.0_3.0_1726334896888.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ct_m2_complete_en_5.5.0_3.0_1726334896888.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("ct_m2_complete","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("ct_m2_complete","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ct_m2_complete| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/crisistransformers/CT-M2-Complete \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-finetune4en_vietnamese_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-finetune4en_vietnamese_pipeline_en.md new file mode 100644 index 00000000000000..af3d35638a7b7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-finetune4en_vietnamese_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetune4en_vietnamese_pipeline pipeline MarianTransformer from TrinhDacPhu +author: John Snow Labs +name: finetune4en_vietnamese_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetune4en_vietnamese_pipeline` is a English model originally trained by TrinhDacPhu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetune4en_vietnamese_pipeline_en_5.5.0_3.0_1726350746442.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetune4en_vietnamese_pipeline_en_5.5.0_3.0_1726350746442.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetune4en_vietnamese_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetune4en_vietnamese_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetune4en_vietnamese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|474.8 MB| + +## References + +https://huggingface.co/TrinhDacPhu/finetune4en-vi + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-intent_identifier_13_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-intent_identifier_13_pipeline_en.md new file mode 100644 index 00000000000000..6d318eac507bd2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-intent_identifier_13_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English intent_identifier_13_pipeline pipeline BertForSequenceClassification from dotzero24 +author: John Snow Labs +name: intent_identifier_13_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`intent_identifier_13_pipeline` is a English model originally trained by dotzero24. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/intent_identifier_13_pipeline_en_5.5.0_3.0_1726348003236.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/intent_identifier_13_pipeline_en_5.5.0_3.0_1726348003236.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("intent_identifier_13_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("intent_identifier_13_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|intent_identifier_13_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|627.8 MB| + +## References + +https://huggingface.co/dotzero24/intent_identifier-13 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_english_tonga_tonga_islands_french_neerajnigam6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_english_tonga_tonga_islands_french_neerajnigam6_pipeline_en.md new file mode 100644 index 00000000000000..41cc309903c99a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-marian_finetuned_kde4_english_tonga_tonga_islands_french_neerajnigam6_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_neerajnigam6_pipeline pipeline MarianTransformer from neerajnigam6 +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_neerajnigam6_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_neerajnigam6_pipeline` is a English model originally trained by neerajnigam6. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_neerajnigam6_pipeline_en_5.5.0_3.0_1726351590800.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_neerajnigam6_pipeline_en_5.5.0_3.0_1726351590800.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_neerajnigam6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_neerajnigam6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_neerajnigam6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|523.4 MB| + +## References + +https://huggingface.co/neerajnigam6/marian-finetuned-kde4-en-to-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-ner_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-14-ner_roberta_en.md new file mode 100644 index 00000000000000..e514b5c3f5cdaf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-ner_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_roberta RoBertaForTokenClassification from textminr +author: John Snow Labs +name: ner_roberta +date: 2024-09-14 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_roberta` is a English model originally trained by textminr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_roberta_en_5.5.0_3.0_1726307053675.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_roberta_en_5.5.0_3.0_1726307053675.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("ner_roberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("ner_roberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|416.0 MB| + +## References + +https://huggingface.co/textminr/ner_roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-pig_latin_tonga_tonga_islands_eng_en.md b/docs/_posts/ahmedlone127/2024-09-14-pig_latin_tonga_tonga_islands_eng_en.md new file mode 100644 index 00000000000000..c68210616afec4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-pig_latin_tonga_tonga_islands_eng_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English pig_latin_tonga_tonga_islands_eng MarianTransformer from soschuetze +author: John Snow Labs +name: pig_latin_tonga_tonga_islands_eng +date: 2024-09-14 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pig_latin_tonga_tonga_islands_eng` is a English model originally trained by soschuetze. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pig_latin_tonga_tonga_islands_eng_en_5.5.0_3.0_1726350916879.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pig_latin_tonga_tonga_islands_eng_en_5.5.0_3.0_1726350916879.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("pig_latin_tonga_tonga_islands_eng","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("pig_latin_tonga_tonga_islands_eng","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pig_latin_tonga_tonga_islands_eng| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|532.7 MB| + +## References + +https://huggingface.co/soschuetze/pig-latin-to-eng \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_base_biomedical_clinical_spanish_finetuned_clinais_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_base_biomedical_clinical_spanish_finetuned_clinais_pipeline_en.md new file mode 100644 index 00000000000000..626d94a8b1cd2c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_base_biomedical_clinical_spanish_finetuned_clinais_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_biomedical_clinical_spanish_finetuned_clinais_pipeline pipeline RoBertaEmbeddings from joheras +author: John Snow Labs +name: roberta_base_biomedical_clinical_spanish_finetuned_clinais_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_biomedical_clinical_spanish_finetuned_clinais_pipeline` is a English model originally trained by joheras. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_biomedical_clinical_spanish_finetuned_clinais_pipeline_en_5.5.0_3.0_1726334130651.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_biomedical_clinical_spanish_finetuned_clinais_pipeline_en_5.5.0_3.0_1726334130651.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_biomedical_clinical_spanish_finetuned_clinais_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_biomedical_clinical_spanish_finetuned_clinais_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_biomedical_clinical_spanish_finetuned_clinais_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|469.4 MB| + +## References + +https://huggingface.co/joheras/roberta-base-biomedical-clinical-es-finetuned-clinais + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_base_epoch_21_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_base_epoch_21_en.md new file mode 100644 index 00000000000000..3e00848ba315ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_base_epoch_21_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_21 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_21 +date: 2024-09-14 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_21` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_21_en_5.5.0_3.0_1726338081851.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_21_en_5.5.0_3.0_1726338081851.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_21","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_21","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_21| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_21 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_med_small_1m_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_med_small_1m_3_pipeline_en.md new file mode 100644 index 00000000000000..7a9f87333dc3bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_med_small_1m_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_med_small_1m_3_pipeline pipeline RoBertaEmbeddings from nyu-mll +author: John Snow Labs +name: roberta_med_small_1m_3_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_med_small_1m_3_pipeline` is a English model originally trained by nyu-mll. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_med_small_1m_3_pipeline_en_5.5.0_3.0_1726300290380.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_med_small_1m_3_pipeline_en_5.5.0_3.0_1726300290380.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_med_small_1m_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_med_small_1m_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_med_small_1m_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|108.0 MB| + +## References + +https://huggingface.co/nyu-mll/roberta-med-small-1M-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-sentiment140_albert_5e_en.md b/docs/_posts/ahmedlone127/2024-09-14-sentiment140_albert_5e_en.md new file mode 100644 index 00000000000000..13d14de6c19029 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-sentiment140_albert_5e_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment140_albert_5e AlbertForSequenceClassification from pig4431 +author: John Snow Labs +name: sentiment140_albert_5e +date: 2024-09-14 +tags: [en, open_source, onnx, sequence_classification, albert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment140_albert_5e` is a English model originally trained by pig4431. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment140_albert_5e_en_5.5.0_3.0_1726308738284.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment140_albert_5e_en_5.5.0_3.0_1726308738284.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = AlbertForSequenceClassification.pretrained("sentiment140_albert_5e","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = AlbertForSequenceClassification.pretrained("sentiment140_albert_5e", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment140_albert_5e| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/pig4431/Sentiment140_ALBERT_5E \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-takalane_afr_roberta_af.md b/docs/_posts/ahmedlone127/2024-09-14-takalane_afr_roberta_af.md new file mode 100644 index 00000000000000..7c3271216cce2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-takalane_afr_roberta_af.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Afrikaans takalane_afr_roberta RoBertaEmbeddings from jannesg +author: John Snow Labs +name: takalane_afr_roberta +date: 2024-09-14 +tags: [af, open_source, onnx, embeddings, roberta] +task: Embeddings +language: af +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`takalane_afr_roberta` is a Afrikaans model originally trained by jannesg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/takalane_afr_roberta_af_5.5.0_3.0_1726338343913.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/takalane_afr_roberta_af_5.5.0_3.0_1726338343913.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("takalane_afr_roberta","af") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("takalane_afr_roberta","af") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|takalane_afr_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|af| +|Size:|311.5 MB| + +## References + +https://huggingface.co/jannesg/takalane_afr_roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-test_whisper_tiny_thai_kritchayahir_en.md b/docs/_posts/ahmedlone127/2024-09-14-test_whisper_tiny_thai_kritchayahir_en.md new file mode 100644 index 00000000000000..968731fd39bc73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-test_whisper_tiny_thai_kritchayahir_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English test_whisper_tiny_thai_kritchayahir WhisperForCTC from kritchayaHir +author: John Snow Labs +name: test_whisper_tiny_thai_kritchayahir +date: 2024-09-14 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_whisper_tiny_thai_kritchayahir` is a English model originally trained by kritchayaHir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_whisper_tiny_thai_kritchayahir_en_5.5.0_3.0_1726325243051.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_whisper_tiny_thai_kritchayahir_en_5.5.0_3.0_1726325243051.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("test_whisper_tiny_thai_kritchayahir","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("test_whisper_tiny_thai_kritchayahir", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_whisper_tiny_thai_kritchayahir| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/kritchayaHir/test-whisper-tiny-th \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_bengali_kurokabe_bn.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_bengali_kurokabe_bn.md new file mode 100644 index 00000000000000..55bf9f81d0ee5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_bengali_kurokabe_bn.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Bengali whisper_small_bengali_kurokabe WhisperForCTC from Kurokabe +author: John Snow Labs +name: whisper_small_bengali_kurokabe +date: 2024-09-14 +tags: [bn, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_bengali_kurokabe` is a Bengali model originally trained by Kurokabe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_bengali_kurokabe_bn_5.5.0_3.0_1726330823123.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_bengali_kurokabe_bn_5.5.0_3.0_1726330823123.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_bengali_kurokabe","bn") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_bengali_kurokabe", "bn") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_bengali_kurokabe| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|bn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Kurokabe/whisper-small-bn \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_cv16_hungarian_v2_pipeline_hu.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_cv16_hungarian_v2_pipeline_hu.md new file mode 100644 index 00000000000000..ef1944df4bc7f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_cv16_hungarian_v2_pipeline_hu.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hungarian whisper_small_cv16_hungarian_v2_pipeline pipeline WhisperForCTC from Hungarians +author: John Snow Labs +name: whisper_small_cv16_hungarian_v2_pipeline +date: 2024-09-14 +tags: [hu, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_cv16_hungarian_v2_pipeline` is a Hungarian model originally trained by Hungarians. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_cv16_hungarian_v2_pipeline_hu_5.5.0_3.0_1726274275141.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_cv16_hungarian_v2_pipeline_hu_5.5.0_3.0_1726274275141.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_cv16_hungarian_v2_pipeline", lang = "hu") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_cv16_hungarian_v2_pipeline", lang = "hu") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_cv16_hungarian_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hu| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Hungarians/whisper-small-cv16-hu-v2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_fine_tuned_russian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_fine_tuned_russian_pipeline_en.md new file mode 100644 index 00000000000000..5ed447987dba21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_fine_tuned_russian_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_fine_tuned_russian_pipeline pipeline WhisperForCTC from artyomboyko +author: John Snow Labs +name: whisper_small_fine_tuned_russian_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_fine_tuned_russian_pipeline` is a English model originally trained by artyomboyko. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_fine_tuned_russian_pipeline_en_5.5.0_3.0_1726279343507.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_fine_tuned_russian_pipeline_en_5.5.0_3.0_1726279343507.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_fine_tuned_russian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_fine_tuned_russian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_fine_tuned_russian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/artyomboyko/whisper-small-fine_tuned-ru + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_mandarin_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_mandarin_en.md new file mode 100644 index 00000000000000..cd7eed038aed2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_mandarin_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_mandarin WhisperForCTC from Wenjian12581 +author: John Snow Labs +name: whisper_small_mandarin +date: 2024-09-14 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_mandarin` is a English model originally trained by Wenjian12581. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_mandarin_en_5.5.0_3.0_1726356226979.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_mandarin_en_5.5.0_3.0_1726356226979.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_mandarin","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_mandarin", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_mandarin| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Wenjian12581/whisper-small-mandarin \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_divehi_model_man_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_divehi_model_man_en.md new file mode 100644 index 00000000000000..2fe50c69f8a188 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_divehi_model_man_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_divehi_model_man WhisperForCTC from model-man +author: John Snow Labs +name: whisper_tiny_divehi_model_man +date: 2024-09-14 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_divehi_model_man` is a English model originally trained by model-man. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_divehi_model_man_en_5.5.0_3.0_1726297329160.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_divehi_model_man_en_5.5.0_3.0_1726297329160.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_divehi_model_man","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_divehi_model_man", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_divehi_model_man| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/model-man/whisper-tiny-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_thai_wrtzp_en.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_thai_wrtzp_en.md new file mode 100644 index 00000000000000..b24e735532211f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_tiny_thai_wrtzp_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_thai_wrtzp WhisperForCTC from wrtzp +author: John Snow Labs +name: whisper_tiny_thai_wrtzp +date: 2024-09-14 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_thai_wrtzp` is a English model originally trained by wrtzp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_thai_wrtzp_en_5.5.0_3.0_1726278440777.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_thai_wrtzp_en_5.5.0_3.0_1726278440777.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_thai_wrtzp","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_thai_wrtzp", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_thai_wrtzp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/wrtzp/whisper-tiny-th \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_all_wooseok0303_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_all_wooseok0303_pipeline_en.md new file mode 100644 index 00000000000000..f4a366f36b89dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_all_wooseok0303_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_wooseok0303_pipeline pipeline XlmRoBertaForTokenClassification from wooseok0303 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_wooseok0303_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_wooseok0303_pipeline` is a English model originally trained by wooseok0303. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_wooseok0303_pipeline_en_5.5.0_3.0_1726292540121.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_wooseok0303_pipeline_en_5.5.0_3.0_1726292540121.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_wooseok0303_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_wooseok0303_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_wooseok0303_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/wooseok0303/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_english_andreaschandra_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_english_andreaschandra_pipeline_en.md new file mode 100644 index 00000000000000..36db3a92c94275 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-xlm_roberta_base_finetuned_panx_english_andreaschandra_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_andreaschandra_pipeline pipeline XlmRoBertaForTokenClassification from andreaschandra +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_andreaschandra_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_andreaschandra_pipeline` is a English model originally trained by andreaschandra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_andreaschandra_pipeline_en_5.5.0_3.0_1726291721833.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_andreaschandra_pipeline_en_5.5.0_3.0_1726291721833.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_andreaschandra_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_andreaschandra_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_andreaschandra_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/andreaschandra/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-babyberta_aochildes_french_aochildes_2_5m_with_masking_finetuned_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-babyberta_aochildes_french_aochildes_2_5m_with_masking_finetuned_squad_pipeline_en.md new file mode 100644 index 00000000000000..4cb179eecdf282 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-babyberta_aochildes_french_aochildes_2_5m_with_masking_finetuned_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English babyberta_aochildes_french_aochildes_2_5m_with_masking_finetuned_squad_pipeline pipeline RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_aochildes_french_aochildes_2_5m_with_masking_finetuned_squad_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_aochildes_french_aochildes_2_5m_with_masking_finetuned_squad_pipeline` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_french_aochildes_2_5m_with_masking_finetuned_squad_pipeline_en_5.5.0_3.0_1726404519698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_french_aochildes_2_5m_with_masking_finetuned_squad_pipeline_en_5.5.0_3.0_1726404519698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("babyberta_aochildes_french_aochildes_2_5m_with_masking_finetuned_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("babyberta_aochildes_french_aochildes_2_5m_with_masking_finetuned_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_aochildes_french_aochildes_2_5m_with_masking_finetuned_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|32.0 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa-aochildes-french_aochildes_2.5M-with-Masking-finetuned-SQuAD + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-babyberta_wikipedia_french_run3_without_masking_finetuned_qamr_en.md b/docs/_posts/ahmedlone127/2024-09-15-babyberta_wikipedia_french_run3_without_masking_finetuned_qamr_en.md new file mode 100644 index 00000000000000..91b97fa9b837a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-babyberta_wikipedia_french_run3_without_masking_finetuned_qamr_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English babyberta_wikipedia_french_run3_without_masking_finetuned_qamr RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_wikipedia_french_run3_without_masking_finetuned_qamr +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_wikipedia_french_run3_without_masking_finetuned_qamr` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_wikipedia_french_run3_without_masking_finetuned_qamr_en_5.5.0_3.0_1726363633994.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_wikipedia_french_run3_without_masking_finetuned_qamr_en_5.5.0_3.0_1726363633994.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_wikipedia_french_run3_without_masking_finetuned_qamr","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_wikipedia_french_run3_without_masking_finetuned_qamr", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_wikipedia_french_run3_without_masking_finetuned_qamr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|32.0 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa-wikipedia_french-run3-without-Masking-finetuned-QAMR \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-bert_gemma2b_multivllm_nodropsus_12_en.md b/docs/_posts/ahmedlone127/2024-09-15-bert_gemma2b_multivllm_nodropsus_12_en.md new file mode 100644 index 00000000000000..85b08ca80fe722 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-bert_gemma2b_multivllm_nodropsus_12_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_gemma2b_multivllm_nodropsus_12 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_gemma2b_multivllm_nodropsus_12 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_gemma2b_multivllm_nodropsus_12` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_gemma2b_multivllm_nodropsus_12_en_5.5.0_3.0_1726394100231.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_gemma2b_multivllm_nodropsus_12_en_5.5.0_3.0_1726394100231.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_gemma2b_multivllm_nodropsus_12","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_gemma2b_multivllm_nodropsus_12", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_gemma2b_multivllm_nodropsus_12| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_gemma2b-multivllm-NodropSus_12 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-bert_large2_en.md b/docs/_posts/ahmedlone127/2024-09-15-bert_large2_en.md new file mode 100644 index 00000000000000..1b0edb8a78aa07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-bert_large2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_large2 RoBertaForSequenceClassification from RogerKam +author: John Snow Labs +name: bert_large2 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large2` is a English model originally trained by RogerKam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large2_en_5.5.0_3.0_1726401366524.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large2_en_5.5.0_3.0_1726401366524.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("bert_large2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("bert_large2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|434.3 MB| + +## References + +https://huggingface.co/RogerKam/BERT-large2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-burmese_qa_model_anm8edboi_en.md b/docs/_posts/ahmedlone127/2024-09-15-burmese_qa_model_anm8edboi_en.md new file mode 100644 index 00000000000000..9cb765933e14f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-burmese_qa_model_anm8edboi_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_qa_model_anm8edboi DistilBertForQuestionAnswering from anm8edboi +author: John Snow Labs +name: burmese_qa_model_anm8edboi +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_qa_model_anm8edboi` is a English model originally trained by anm8edboi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_qa_model_anm8edboi_en_5.5.0_3.0_1726382680743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_qa_model_anm8edboi_en_5.5.0_3.0_1726382680743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_qa_model_anm8edboi","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_qa_model_anm8edboi", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_qa_model_anm8edboi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/anm8edboi/my_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-burmese_sentiment_analysis_model_jay369_en.md b/docs/_posts/ahmedlone127/2024-09-15-burmese_sentiment_analysis_model_jay369_en.md new file mode 100644 index 00000000000000..32c89b153c9f4e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-burmese_sentiment_analysis_model_jay369_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_sentiment_analysis_model_jay369 DistilBertForSequenceClassification from Jay369 +author: John Snow Labs +name: burmese_sentiment_analysis_model_jay369 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_sentiment_analysis_model_jay369` is a English model originally trained by Jay369. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_sentiment_analysis_model_jay369_en_5.5.0_3.0_1726385401058.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_sentiment_analysis_model_jay369_en_5.5.0_3.0_1726385401058.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_sentiment_analysis_model_jay369","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_sentiment_analysis_model_jay369", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_sentiment_analysis_model_jay369| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Jay369/my_sentiment_analysis_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-cuad_distil_governing_law_cased_08_31_v1_en.md b/docs/_posts/ahmedlone127/2024-09-15-cuad_distil_governing_law_cased_08_31_v1_en.md new file mode 100644 index 00000000000000..0437c7362ecfe7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-cuad_distil_governing_law_cased_08_31_v1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English cuad_distil_governing_law_cased_08_31_v1 DistilBertForQuestionAnswering from saraks +author: John Snow Labs +name: cuad_distil_governing_law_cased_08_31_v1 +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cuad_distil_governing_law_cased_08_31_v1` is a English model originally trained by saraks. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cuad_distil_governing_law_cased_08_31_v1_en_5.5.0_3.0_1726382711221.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cuad_distil_governing_law_cased_08_31_v1_en_5.5.0_3.0_1726382711221.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("cuad_distil_governing_law_cased_08_31_v1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("cuad_distil_governing_law_cased_08_31_v1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cuad_distil_governing_law_cased_08_31_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/saraks/cuad-distil-governing_law-cased-08-31-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-cuatr_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-cuatr_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..d892ba26c35966 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-cuatr_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cuatr_distilbert_pipeline pipeline DistilBertForSequenceClassification from chathuru +author: John Snow Labs +name: cuatr_distilbert_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cuatr_distilbert_pipeline` is a English model originally trained by chathuru. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cuatr_distilbert_pipeline_en_5.5.0_3.0_1726394184110.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cuatr_distilbert_pipeline_en_5.5.0_3.0_1726394184110.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cuatr_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cuatr_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cuatr_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chathuru/CuATR-distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distil_news_finetune_en.md b/docs/_posts/ahmedlone127/2024-09-15-distil_news_finetune_en.md new file mode 100644 index 00000000000000..12265e04691ebe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distil_news_finetune_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distil_news_finetune DistilBertForSequenceClassification from anggari +author: John Snow Labs +name: distil_news_finetune +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distil_news_finetune` is a English model originally trained by anggari. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distil_news_finetune_en_5.5.0_3.0_1726366306526.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distil_news_finetune_en_5.5.0_3.0_1726366306526.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distil_news_finetune","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distil_news_finetune", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distil_news_finetune| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/anggari/distil_news_finetune \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_fact_updates_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_fact_updates_pipeline_en.md new file mode 100644 index 00000000000000..3a1b98125c5031 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_fact_updates_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_fact_updates_pipeline pipeline DistilBertForSequenceClassification from rishavranaut +author: John Snow Labs +name: distilbert_base_fact_updates_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_fact_updates_pipeline` is a English model originally trained by rishavranaut. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_fact_updates_pipeline_en_5.5.0_3.0_1726394297201.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_fact_updates_pipeline_en_5.5.0_3.0_1726394297201.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_fact_updates_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_fact_updates_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_fact_updates_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rishavranaut/distilbert-base-fact-updates + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_spanish_cased_autext_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_spanish_cased_autext_pipeline_en.md new file mode 100644 index 00000000000000..58f0eb467b69e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_spanish_cased_autext_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_spanish_cased_autext_pipeline pipeline DistilBertForSequenceClassification from jorgefg03 +author: John Snow Labs +name: distilbert_base_spanish_cased_autext_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_spanish_cased_autext_pipeline` is a English model originally trained by jorgefg03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_spanish_cased_autext_pipeline_en_5.5.0_3.0_1726394208333.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_spanish_cased_autext_pipeline_en_5.5.0_3.0_1726394208333.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_spanish_cased_autext_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_spanish_cased_autext_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_spanish_cased_autext_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|238.9 MB| + +## References + +https://huggingface.co/jorgefg03/distilbert-base-es-cased-autext + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_clinc_dohwan9672_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_clinc_dohwan9672_en.md new file mode 100644 index 00000000000000..fc81e69531abaf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_clinc_dohwan9672_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_dohwan9672 DistilBertForSequenceClassification from DoHwan9672 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_dohwan9672 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_dohwan9672` is a English model originally trained by DoHwan9672. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_dohwan9672_en_5.5.0_3.0_1726406100140.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_dohwan9672_en_5.5.0_3.0_1726406100140.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_dohwan9672","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_dohwan9672", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_dohwan9672| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/DoHwan9672/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_hcyying_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_hcyying_en.md new file mode 100644 index 00000000000000..b09fbce68f1703 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_hcyying_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_hcyying DistilBertForSequenceClassification from hcyying +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_hcyying +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_hcyying` is a English model originally trained by hcyying. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hcyying_en_5.5.0_3.0_1726394355515.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hcyying_en_5.5.0_3.0_1726394355515.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_hcyying","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_hcyying", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_hcyying| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hcyying/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_hschia2_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_hschia2_en.md new file mode 100644 index 00000000000000..9d9641f6bb6407 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_emotion_hschia2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_hschia2 DistilBertForSequenceClassification from hschia2 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_hschia2 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_hschia2` is a English model originally trained by hschia2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hschia2_en_5.5.0_3.0_1726385369811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hschia2_en_5.5.0_3.0_1726385369811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_hschia2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_hschia2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_hschia2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hschia2/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_manik114_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_manik114_pipeline_en.md new file mode 100644 index 00000000000000..0d7e54e36310ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_finetuned_squad_manik114_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_manik114_pipeline pipeline DistilBertForQuestionAnswering from manik114 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_manik114_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_manik114_pipeline` is a English model originally trained by manik114. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_manik114_pipeline_en_5.5.0_3.0_1726382766265.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_manik114_pipeline_en_5.5.0_3.0_1726382766265.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_manik114_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_manik114_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_manik114_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/manik114/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100_en.md new file mode 100644 index 00000000000000..bcbd49d7e5696d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100_en_5.5.0_3.0_1726406018472.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100_en_5.5.0_3.0_1726406018472.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1_plprefix0stlarge2_simsp400_clean100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st2sd_ut72ut1_PLPrefix0stlarge2_simsp400_clean100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_imdb_sentiment_aniket_jain_9_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_imdb_sentiment_aniket_jain_9_en.md new file mode 100644 index 00000000000000..b9b076527b2729 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_finetuned_imdb_sentiment_aniket_jain_9_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_finetuned_imdb_sentiment_aniket_jain_9 DistilBertForSequenceClassification from aniket-jain-9 +author: John Snow Labs +name: distilbert_finetuned_imdb_sentiment_aniket_jain_9 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_imdb_sentiment_aniket_jain_9` is a English model originally trained by aniket-jain-9. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_sentiment_aniket_jain_9_en_5.5.0_3.0_1726385194171.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_sentiment_aniket_jain_9_en_5.5.0_3.0_1726385194171.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_finetuned_imdb_sentiment_aniket_jain_9","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_finetuned_imdb_sentiment_aniket_jain_9", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_imdb_sentiment_aniket_jain_9| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aniket-jain-9/distilbert-finetuned-imdb-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbertbaselineoneepochevaluate_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbertbaselineoneepochevaluate_pipeline_en.md new file mode 100644 index 00000000000000..f5b32a612a376c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbertbaselineoneepochevaluate_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbertbaselineoneepochevaluate_pipeline pipeline DistilBertForQuestionAnswering from KarthikAlagarsamy +author: John Snow Labs +name: distilbertbaselineoneepochevaluate_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbertbaselineoneepochevaluate_pipeline` is a English model originally trained by KarthikAlagarsamy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbertbaselineoneepochevaluate_pipeline_en_5.5.0_3.0_1726435328354.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbertbaselineoneepochevaluate_pipeline_en_5.5.0_3.0_1726435328354.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbertbaselineoneepochevaluate_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbertbaselineoneepochevaluate_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbertbaselineoneepochevaluate_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/KarthikAlagarsamy/distilbertbaselineoneepochevaluate + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbertbaselinethreeepochevaluate_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbertbaselinethreeepochevaluate_en.md new file mode 100644 index 00000000000000..0dce79c8757870 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbertbaselinethreeepochevaluate_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbertbaselinethreeepochevaluate DistilBertForQuestionAnswering from KarthikAlagarsamy +author: John Snow Labs +name: distilbertbaselinethreeepochevaluate +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbertbaselinethreeepochevaluate` is a English model originally trained by KarthikAlagarsamy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbertbaselinethreeepochevaluate_en_5.5.0_3.0_1726435016793.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbertbaselinethreeepochevaluate_en_5.5.0_3.0_1726435016793.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbertbaselinethreeepochevaluate","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbertbaselinethreeepochevaluate", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbertbaselinethreeepochevaluate| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/KarthikAlagarsamy/distilbertbaselinethreeepochevaluate \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-fae_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-fae_pipeline_en.md new file mode 100644 index 00000000000000..c2fab022fd10fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-fae_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fae_pipeline pipeline BertEmbeddings from sereneWithU +author: John Snow Labs +name: fae_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fae_pipeline` is a English model originally trained by sereneWithU. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fae_pipeline_en_5.5.0_3.0_1726444271544.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fae_pipeline_en_5.5.0_3.0_1726444271544.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fae_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fae_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fae_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/sereneWithU/FAE + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-finetuning_distilbert_model_steam_game_reviews_en.md b/docs/_posts/ahmedlone127/2024-09-15-finetuning_distilbert_model_steam_game_reviews_en.md new file mode 100644 index 00000000000000..5a6960e605c684 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-finetuning_distilbert_model_steam_game_reviews_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_distilbert_model_steam_game_reviews DistilBertForSequenceClassification from zitroeth +author: John Snow Labs +name: finetuning_distilbert_model_steam_game_reviews +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_distilbert_model_steam_game_reviews` is a English model originally trained by zitroeth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_distilbert_model_steam_game_reviews_en_5.5.0_3.0_1726393795569.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_distilbert_model_steam_game_reviews_en_5.5.0_3.0_1726393795569.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_distilbert_model_steam_game_reviews","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_distilbert_model_steam_game_reviews", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_distilbert_model_steam_game_reviews| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/zitroeth/finetuning-distilbert-model-steam-game-reviews \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-gal_ner_portuguese_2_en.md b/docs/_posts/ahmedlone127/2024-09-15-gal_ner_portuguese_2_en.md new file mode 100644 index 00000000000000..7e8ae3d1e37a38 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-gal_ner_portuguese_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English gal_ner_portuguese_2 XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: gal_ner_portuguese_2 +date: 2024-09-15 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_ner_portuguese_2` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_ner_portuguese_2_en_5.5.0_3.0_1726398340638.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_ner_portuguese_2_en_5.5.0_3.0_1726398340638.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_ner_portuguese_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_ner_portuguese_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_ner_portuguese_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|389.6 MB| + +## References + +https://huggingface.co/homersimpson/gal-ner-pt-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-hamsa_tiny_v0_8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-hamsa_tiny_v0_8_pipeline_en.md new file mode 100644 index 00000000000000..ea1ebb908d3786 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-hamsa_tiny_v0_8_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English hamsa_tiny_v0_8_pipeline pipeline WhisperForCTC from Ahmed107 +author: John Snow Labs +name: hamsa_tiny_v0_8_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hamsa_tiny_v0_8_pipeline` is a English model originally trained by Ahmed107. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hamsa_tiny_v0_8_pipeline_en_5.5.0_3.0_1726427608569.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hamsa_tiny_v0_8_pipeline_en_5.5.0_3.0_1726427608569.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hamsa_tiny_v0_8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hamsa_tiny_v0_8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hamsa_tiny_v0_8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.5 MB| + +## References + +https://huggingface.co/Ahmed107/hamsa-tiny-v0.8 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-lenate_model_5_en.md b/docs/_posts/ahmedlone127/2024-09-15-lenate_model_5_en.md new file mode 100644 index 00000000000000..39c5b4eef0eb6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-lenate_model_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English lenate_model_5 DistilBertForSequenceClassification from lenate +author: John Snow Labs +name: lenate_model_5 +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lenate_model_5` is a English model originally trained by lenate. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lenate_model_5_en_5.5.0_3.0_1726385297982.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lenate_model_5_en_5.5.0_3.0_1726385297982.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("lenate_model_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("lenate_model_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lenate_model_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lenate/lenate_model_5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-m7_mlm_en.md b/docs/_posts/ahmedlone127/2024-09-15-m7_mlm_en.md new file mode 100644 index 00000000000000..72f9d2676982f3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-m7_mlm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English m7_mlm RoBertaEmbeddings from S2312dal +author: John Snow Labs +name: m7_mlm +date: 2024-09-15 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`m7_mlm` is a English model originally trained by S2312dal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/m7_mlm_en_5.5.0_3.0_1726383499912.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/m7_mlm_en_5.5.0_3.0_1726383499912.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("m7_mlm","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("m7_mlm","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|m7_mlm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.7 MB| + +## References + +https://huggingface.co/S2312dal/M7_MLM \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-pardonmycaption_en.md b/docs/_posts/ahmedlone127/2024-09-15-pardonmycaption_en.md new file mode 100644 index 00000000000000..0bd83665c61985 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-pardonmycaption_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English pardonmycaption DistilBertForSequenceClassification from tarekziade +author: John Snow Labs +name: pardonmycaption +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pardonmycaption` is a English model originally trained by tarekziade. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pardonmycaption_en_5.5.0_3.0_1726384787799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pardonmycaption_en_5.5.0_3.0_1726384787799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("pardonmycaption","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("pardonmycaption", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pardonmycaption| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tarekziade/pardonmycaption \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_base_finetuned_squad_ngchuchi_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_base_finetuned_squad_ngchuchi_en.md new file mode 100644 index 00000000000000..3849c45a083fea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_base_finetuned_squad_ngchuchi_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_base_finetuned_squad_ngchuchi RoBertaForQuestionAnswering from ngchuchi +author: John Snow Labs +name: roberta_base_finetuned_squad_ngchuchi +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_squad_ngchuchi` is a English model originally trained by ngchuchi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_squad_ngchuchi_en_5.5.0_3.0_1726368974488.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_squad_ngchuchi_en_5.5.0_3.0_1726368974488.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_finetuned_squad_ngchuchi","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_base_finetuned_squad_ngchuchi", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_squad_ngchuchi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|443.2 MB| + +## References + +https://huggingface.co/ngchuchi/roberta-base-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_finetuned_medquad_2_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_finetuned_medquad_2_en.md new file mode 100644 index 00000000000000..bbd629f28f8b27 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_finetuned_medquad_2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_finetuned_medquad_2 RoBertaForQuestionAnswering from DataScientist1122 +author: John Snow Labs +name: roberta_finetuned_medquad_2 +date: 2024-09-15 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_medquad_2` is a English model originally trained by DataScientist1122. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_medquad_2_en_5.5.0_3.0_1726363983517.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_medquad_2_en_5.5.0_3.0_1726363983517.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_finetuned_medquad_2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_finetuned_medquad_2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_medquad_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|443.9 MB| + +## References + +https://huggingface.co/DataScientist1122/roberta-finetuned-medquad_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_llm_noninstruct_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_llm_noninstruct_pipeline_en.md new file mode 100644 index 00000000000000..b1787472943dde --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_llm_noninstruct_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_llm_noninstruct_pipeline pipeline RoBertaForSequenceClassification from Multiperspective +author: John Snow Labs +name: roberta_llm_noninstruct_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_llm_noninstruct_pipeline` is a English model originally trained by Multiperspective. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_llm_noninstruct_pipeline_en_5.5.0_3.0_1726439689269.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_llm_noninstruct_pipeline_en_5.5.0_3.0_1726439689269.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_llm_noninstruct_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_llm_noninstruct_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_llm_noninstruct_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Multiperspective/roberta-llm-noninstruct + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-roberta_qa_RoBERTa_for_seizureFrequency_QA_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-roberta_qa_RoBERTa_for_seizureFrequency_QA_pipeline_en.md new file mode 100644 index 00000000000000..8f8b28a1868f95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-roberta_qa_RoBERTa_for_seizureFrequency_QA_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_qa_RoBERTa_for_seizureFrequency_QA_pipeline pipeline RoBertaForQuestionAnswering from CNT-UPenn +author: John Snow Labs +name: roberta_qa_RoBERTa_for_seizureFrequency_QA_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_qa_RoBERTa_for_seizureFrequency_QA_pipeline` is a English model originally trained by CNT-UPenn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_qa_RoBERTa_for_seizureFrequency_QA_pipeline_en_5.5.0_3.0_1726364233424.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_qa_RoBERTa_for_seizureFrequency_QA_pipeline_en_5.5.0_3.0_1726364233424.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_qa_RoBERTa_for_seizureFrequency_QA_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_qa_RoBERTa_for_seizureFrequency_QA_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_qa_RoBERTa_for_seizureFrequency_QA_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/CNT-UPenn/RoBERTa_for_seizureFrequency_QA + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-sent_bert_base_arabic_camelbert_msa_half_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-15-sent_bert_base_arabic_camelbert_msa_half_pipeline_ar.md new file mode 100644 index 00000000000000..8db992a22ee512 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-sent_bert_base_arabic_camelbert_msa_half_pipeline_ar.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Arabic sent_bert_base_arabic_camelbert_msa_half_pipeline pipeline BertSentenceEmbeddings from CAMeL-Lab +author: John Snow Labs +name: sent_bert_base_arabic_camelbert_msa_half_pipeline +date: 2024-09-15 +tags: [ar, open_source, pipeline, onnx] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_arabic_camelbert_msa_half_pipeline` is a Arabic model originally trained by CAMeL-Lab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabic_camelbert_msa_half_pipeline_ar_5.5.0_3.0_1726377533658.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabic_camelbert_msa_half_pipeline_ar_5.5.0_3.0_1726377533658.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_arabic_camelbert_msa_half_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_arabic_camelbert_msa_half_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_arabic_camelbert_msa_half_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|406.9 MB| + +## References + +https://huggingface.co/CAMeL-Lab/bert-base-arabic-camelbert-msa-half + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-stego_classifier_checkpoint_epoch_61_start_exp_time_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-stego_classifier_checkpoint_epoch_61_start_exp_time_pipeline_en.md new file mode 100644 index 00000000000000..d9b20a53c7f9f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-stego_classifier_checkpoint_epoch_61_start_exp_time_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_61_start_exp_time_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_61_start_exp_time_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_61_start_exp_time_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_61_start_exp_time_pipeline_en_5.5.0_3.0_1726385410407.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_61_start_exp_time_pipeline_en_5.5.0_3.0_1726385410407.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stego_classifier_checkpoint_epoch_61_start_exp_time_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stego_classifier_checkpoint_epoch_61_start_exp_time_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_61_start_exp_time_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-61-START_EXP_TIME + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_small_cantonese_funpang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_cantonese_funpang_pipeline_en.md new file mode 100644 index 00000000000000..2476552048075d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_cantonese_funpang_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_cantonese_funpang_pipeline pipeline WhisperForCTC from FunPang +author: John Snow Labs +name: whisper_small_cantonese_funpang_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_cantonese_funpang_pipeline` is a English model originally trained by FunPang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_cantonese_funpang_pipeline_en_5.5.0_3.0_1726411138087.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_cantonese_funpang_pipeline_en_5.5.0_3.0_1726411138087.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_cantonese_funpang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_cantonese_funpang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_cantonese_funpang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/FunPang/whisper-small-Cantonese + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_small_spanish_zuazo_es.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_spanish_zuazo_es.md new file mode 100644 index 00000000000000..a8bc2c95d36f2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_small_spanish_zuazo_es.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Castilian, Spanish whisper_small_spanish_zuazo WhisperForCTC from zuazo +author: John Snow Labs +name: whisper_small_spanish_zuazo +date: 2024-09-15 +tags: [es, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_spanish_zuazo` is a Castilian, Spanish model originally trained by zuazo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_spanish_zuazo_es_5.5.0_3.0_1726390289364.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_spanish_zuazo_es_5.5.0_3.0_1726390289364.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_spanish_zuazo","es") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_spanish_zuazo", "es") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_spanish_zuazo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|es| +|Size:|1.7 GB| + +## References + +https://huggingface.co/zuazo/whisper-small-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_faroese_100h_5k_steps_v2_en.md b/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_faroese_100h_5k_steps_v2_en.md new file mode 100644 index 00000000000000..a8b93d707422e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-whisper_tiny_faroese_100h_5k_steps_v2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_faroese_100h_5k_steps_v2 WhisperForCTC from davidilag +author: John Snow Labs +name: whisper_tiny_faroese_100h_5k_steps_v2 +date: 2024-09-15 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_faroese_100h_5k_steps_v2` is a English model originally trained by davidilag. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_faroese_100h_5k_steps_v2_en_5.5.0_3.0_1726412161207.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_faroese_100h_5k_steps_v2_en_5.5.0_3.0_1726412161207.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_faroese_100h_5k_steps_v2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_faroese_100h_5k_steps_v2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_faroese_100h_5k_steps_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/davidilag/whisper-tiny-fo-100h-5k-steps_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_all_cicimen_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_all_cicimen_en.md new file mode 100644 index 00000000000000..ed5ae70b54a65b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_finetuned_panx_all_cicimen_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_cicimen XlmRoBertaForTokenClassification from cicimen +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_cicimen +date: 2024-09-15 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_cicimen` is a English model originally trained by cicimen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_cicimen_en_5.5.0_3.0_1726362566584.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_cicimen_en_5.5.0_3.0_1726362566584.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_cicimen","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_cicimen", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_cicimen| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/cicimen/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_trimmed_arabic_xnli_arabic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_trimmed_arabic_xnli_arabic_pipeline_en.md new file mode 100644 index 00000000000000..39a574c8f10a6a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-xlm_roberta_base_trimmed_arabic_xnli_arabic_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_arabic_xnli_arabic_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_arabic_xnli_arabic_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_arabic_xnli_arabic_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_arabic_xnli_arabic_pipeline_en_5.5.0_3.0_1726442040510.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_arabic_xnli_arabic_pipeline_en_5.5.0_3.0_1726442040510.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_trimmed_arabic_xnli_arabic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_trimmed_arabic_xnli_arabic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_arabic_xnli_arabic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.4 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-ar-xnli-ar + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-2nddeproberta_en.md b/docs/_posts/ahmedlone127/2024-09-16-2nddeproberta_en.md new file mode 100644 index 00000000000000..f5ad7da51b0aa0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-2nddeproberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2nddeproberta RoBertaForSequenceClassification from ericNguyen0132 +author: John Snow Labs +name: 2nddeproberta +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2nddeproberta` is a English model originally trained by ericNguyen0132. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2nddeproberta_en_5.5.0_3.0_1726470936268.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2nddeproberta_en_5.5.0_3.0_1726470936268.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("2nddeproberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("2nddeproberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2nddeproberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ericNguyen0132/2ndDepRoBERTa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-440fc6ae_75a5_4a7e_a238_65e06e620a59_en.md b/docs/_posts/ahmedlone127/2024-09-16-440fc6ae_75a5_4a7e_a238_65e06e620a59_en.md new file mode 100644 index 00000000000000..47d14d6b458521 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-440fc6ae_75a5_4a7e_a238_65e06e620a59_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 440fc6ae_75a5_4a7e_a238_65e06e620a59 RoBertaForSequenceClassification from IDQO +author: John Snow Labs +name: 440fc6ae_75a5_4a7e_a238_65e06e620a59 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`440fc6ae_75a5_4a7e_a238_65e06e620a59` is a English model originally trained by IDQO. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/440fc6ae_75a5_4a7e_a238_65e06e620a59_en_5.5.0_3.0_1726527948657.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/440fc6ae_75a5_4a7e_a238_65e06e620a59_en_5.5.0_3.0_1726527948657.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("440fc6ae_75a5_4a7e_a238_65e06e620a59","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("440fc6ae_75a5_4a7e_a238_65e06e620a59", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|440fc6ae_75a5_4a7e_a238_65e06e620a59| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|437.9 MB| + +## References + +https://huggingface.co/IDQO/440fc6ae-75a5-4a7e-a238-65e06e620a59 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-albert_finetuned_tweet_eval_en.md b/docs/_posts/ahmedlone127/2024-09-16-albert_finetuned_tweet_eval_en.md new file mode 100644 index 00000000000000..39243970fb976c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-albert_finetuned_tweet_eval_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English albert_finetuned_tweet_eval AlbertForSequenceClassification from iaminhridoy +author: John Snow Labs +name: albert_finetuned_tweet_eval +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, albert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_finetuned_tweet_eval` is a English model originally trained by iaminhridoy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_finetuned_tweet_eval_en_5.5.0_3.0_1726523514014.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_finetuned_tweet_eval_en_5.5.0_3.0_1726523514014.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = AlbertForSequenceClassification.pretrained("albert_finetuned_tweet_eval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = AlbertForSequenceClassification.pretrained("albert_finetuned_tweet_eval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_finetuned_tweet_eval| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/iaminhridoy/AlBert-finetuned-Tweet_Eval \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-all_roberta_large_v1_travel_7_16_5_en.md b/docs/_posts/ahmedlone127/2024-09-16-all_roberta_large_v1_travel_7_16_5_en.md new file mode 100644 index 00000000000000..eae3bc51f317ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-all_roberta_large_v1_travel_7_16_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_travel_7_16_5 RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_travel_7_16_5 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_travel_7_16_5` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_travel_7_16_5_en_5.5.0_3.0_1726456078562.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_travel_7_16_5_en_5.5.0_3.0_1726456078562.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_travel_7_16_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_travel_7_16_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_travel_7_16_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-travel-7-16-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_danielrsn_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_danielrsn_pipeline_en.md new file mode 100644 index 00000000000000..25e22d115ba619 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_emotion_danielrsn_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_danielrsn_pipeline pipeline DistilBertForSequenceClassification from danielrsn +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_danielrsn_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_danielrsn_pipeline` is a English model originally trained by danielrsn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_danielrsn_pipeline_en_5.5.0_3.0_1726506934043.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_danielrsn_pipeline_en_5.5.0_3.0_1726506934043.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_danielrsn_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_danielrsn_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_danielrsn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/danielrsn/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_keerthana12_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_keerthana12_pipeline_en.md new file mode 100644 index 00000000000000..ce6cb509242275 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-distilbert_base_uncased_finetuned_squad_keerthana12_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_keerthana12_pipeline pipeline DistilBertForQuestionAnswering from Keerthana12 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_keerthana12_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_keerthana12_pipeline` is a English model originally trained by Keerthana12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_keerthana12_pipeline_en_5.5.0_3.0_1726515144302.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_keerthana12_pipeline_en_5.5.0_3.0_1726515144302.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_keerthana12_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_keerthana12_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_keerthana12_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Keerthana12/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-efficient_mlm_m0_40_801010_en.md b/docs/_posts/ahmedlone127/2024-09-16-efficient_mlm_m0_40_801010_en.md new file mode 100644 index 00000000000000..c8c05fe74c6a36 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-efficient_mlm_m0_40_801010_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English efficient_mlm_m0_40_801010 RoBertaEmbeddings from princeton-nlp +author: John Snow Labs +name: efficient_mlm_m0_40_801010 +date: 2024-09-16 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`efficient_mlm_m0_40_801010` is a English model originally trained by princeton-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/efficient_mlm_m0_40_801010_en_5.5.0_3.0_1726513885784.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/efficient_mlm_m0_40_801010_en_5.5.0_3.0_1726513885784.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("efficient_mlm_m0_40_801010","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("efficient_mlm_m0_40_801010","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|efficient_mlm_m0_40_801010| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|845.3 MB| + +## References + +https://huggingface.co/princeton-nlp/efficient_mlm_m0.40-801010 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-finetuned_adversarial_paraphrase_model_test_en.md b/docs/_posts/ahmedlone127/2024-09-16-finetuned_adversarial_paraphrase_model_test_en.md new file mode 100644 index 00000000000000..943f58bd2d5a67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-finetuned_adversarial_paraphrase_model_test_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_adversarial_paraphrase_model_test RoBertaForSequenceClassification from chitra +author: John Snow Labs +name: finetuned_adversarial_paraphrase_model_test +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_adversarial_paraphrase_model_test` is a English model originally trained by chitra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_adversarial_paraphrase_model_test_en_5.5.0_3.0_1726456353416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_adversarial_paraphrase_model_test_en_5.5.0_3.0_1726456353416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("finetuned_adversarial_paraphrase_model_test","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("finetuned_adversarial_paraphrase_model_test", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_adversarial_paraphrase_model_test| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/chitra/finetuned-adversarial-paraphrase-model-test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-finetuning_sentiment_model_3000_samples_sarathaer_en.md b/docs/_posts/ahmedlone127/2024-09-16-finetuning_sentiment_model_3000_samples_sarathaer_en.md new file mode 100644 index 00000000000000..bebb55754145c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-finetuning_sentiment_model_3000_samples_sarathaer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_sarathaer DistilBertForSequenceClassification from Sarathaer +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_sarathaer +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_sarathaer` is a English model originally trained by Sarathaer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_sarathaer_en_5.5.0_3.0_1726525624894.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_sarathaer_en_5.5.0_3.0_1726525624894.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_sarathaer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_sarathaer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_sarathaer| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Sarathaer/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-hecone_he.md b/docs/_posts/ahmedlone127/2024-09-16-hecone_he.md new file mode 100644 index 00000000000000..77444523825a53 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-hecone_he.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Hebrew hecone RoBertaForTokenClassification from HeTree +author: John Snow Labs +name: hecone +date: 2024-09-16 +tags: [he, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: he +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hecone` is a Hebrew model originally trained by HeTree. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hecone_he_5.5.0_3.0_1726452820874.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hecone_he_5.5.0_3.0_1726452820874.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("hecone","he") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("hecone", "he") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hecone| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|he| +|Size:|466.0 MB| + +## References + +https://huggingface.co/HeTree/HeConE \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-latin_english_base_aeneid_holdout_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-latin_english_base_aeneid_holdout_pipeline_en.md new file mode 100644 index 00000000000000..b6b4d964839f3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-latin_english_base_aeneid_holdout_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English latin_english_base_aeneid_holdout_pipeline pipeline MarianTransformer from grosenthal +author: John Snow Labs +name: latin_english_base_aeneid_holdout_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`latin_english_base_aeneid_holdout_pipeline` is a English model originally trained by grosenthal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/latin_english_base_aeneid_holdout_pipeline_en_5.5.0_3.0_1726457128114.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/latin_english_base_aeneid_holdout_pipeline_en_5.5.0_3.0_1726457128114.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("latin_english_base_aeneid_holdout_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("latin_english_base_aeneid_holdout_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|latin_english_base_aeneid_holdout_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|539.7 MB| + +## References + +https://huggingface.co/grosenthal/la_en_base_aeneid_holdout + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-maltese_ach_english_en.md b/docs/_posts/ahmedlone127/2024-09-16-maltese_ach_english_en.md new file mode 100644 index 00000000000000..6ce4f8c99dfaa9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-maltese_ach_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English maltese_ach_english MarianTransformer from Ogayo +author: John Snow Labs +name: maltese_ach_english +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`maltese_ach_english` is a English model originally trained by Ogayo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/maltese_ach_english_en_5.5.0_3.0_1726465263736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/maltese_ach_english_en_5.5.0_3.0_1726465263736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("maltese_ach_english","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("maltese_ach_english","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|maltese_ach_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|465.5 MB| + +## References + +https://huggingface.co/Ogayo/mt-ach-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-maltese_coref_english_spanish_gender_exp_en.md b/docs/_posts/ahmedlone127/2024-09-16-maltese_coref_english_spanish_gender_exp_en.md new file mode 100644 index 00000000000000..4a5da8c8d32aef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-maltese_coref_english_spanish_gender_exp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English maltese_coref_english_spanish_gender_exp MarianTransformer from nlphuji +author: John Snow Labs +name: maltese_coref_english_spanish_gender_exp +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`maltese_coref_english_spanish_gender_exp` is a English model originally trained by nlphuji. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/maltese_coref_english_spanish_gender_exp_en_5.5.0_3.0_1726457295217.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/maltese_coref_english_spanish_gender_exp_en_5.5.0_3.0_1726457295217.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("maltese_coref_english_spanish_gender_exp","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("maltese_coref_english_spanish_gender_exp","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|maltese_coref_english_spanish_gender_exp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|540.3 MB| + +## References + +https://huggingface.co/nlphuji/mt_coref_en_es_gender_exp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate_en.md new file mode 100644 index 00000000000000..f02cc668c4986e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate MarianTransformer from ecat3rina +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate` is a English model originally trained by ecat3rina. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate_en_5.5.0_3.0_1726491011715.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate_en_5.5.0_3.0_1726491011715.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_romanian_accelerate| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.3 MB| + +## References + +https://huggingface.co/ecat3rina/marian-finetuned-kde4-en-to-ro-accelerate \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-nerd_nerd_random1_seed1_twitter_roberta_base_2022_154m_en.md b/docs/_posts/ahmedlone127/2024-09-16-nerd_nerd_random1_seed1_twitter_roberta_base_2022_154m_en.md new file mode 100644 index 00000000000000..b4ac7d9a86b297 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-nerd_nerd_random1_seed1_twitter_roberta_base_2022_154m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nerd_nerd_random1_seed1_twitter_roberta_base_2022_154m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: nerd_nerd_random1_seed1_twitter_roberta_base_2022_154m +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerd_nerd_random1_seed1_twitter_roberta_base_2022_154m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerd_nerd_random1_seed1_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1726527132370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerd_nerd_random1_seed1_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1726527132370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("nerd_nerd_random1_seed1_twitter_roberta_base_2022_154m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("nerd_nerd_random1_seed1_twitter_roberta_base_2022_154m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerd_nerd_random1_seed1_twitter_roberta_base_2022_154m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/tweettemposhift/nerd-nerd_random1_seed1-twitter-roberta-base-2022-154m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-nofibot2_en.md b/docs/_posts/ahmedlone127/2024-09-16-nofibot2_en.md new file mode 100644 index 00000000000000..35552cbc9830c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-nofibot2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English nofibot2 DistilBertForQuestionAnswering from aslakeinbu +author: John Snow Labs +name: nofibot2 +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nofibot2` is a English model originally trained by aslakeinbu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nofibot2_en_5.5.0_3.0_1726515244749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nofibot2_en_5.5.0_3.0_1726515244749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("nofibot2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("nofibot2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nofibot2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/aslakeinbu/nofibot2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_base_ailem_random_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_base_ailem_random_pipeline_en.md new file mode 100644 index 00000000000000..077c73e9506cbc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_base_ailem_random_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_base_ailem_random_pipeline pipeline MarianTransformer from ethansimrm +author: John Snow Labs +name: opus_base_ailem_random_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_base_ailem_random_pipeline` is a English model originally trained by ethansimrm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_base_ailem_random_pipeline_en_5.5.0_3.0_1726457143837.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_base_ailem_random_pipeline_en_5.5.0_3.0_1726457143837.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_base_ailem_random_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_base_ailem_random_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_base_ailem_random_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.0 MB| + +## References + +https://huggingface.co/ethansimrm/opus_base_ailem_random + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan_en.md new file mode 100644 index 00000000000000..0b80b3ebd99438 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan MarianTransformer from fxshan +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan` is a English model originally trained by fxshan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan_en_5.5.0_3.0_1726491579349.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan_en_5.5.0_3.0_1726491579349.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_fxshan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.5 MB| + +## References + +https://huggingface.co/fxshan/opus-mt-en-ro-finetuned-en-to-ro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_polaris79_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_polaris79_pipeline_en.md new file mode 100644 index 00000000000000..652562dc721606 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_polaris79_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_polaris79_pipeline pipeline MarianTransformer from polaris79 +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_polaris79_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_polaris79_pipeline` is a English model originally trained by polaris79. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_polaris79_pipeline_en_5.5.0_3.0_1726457552561.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_polaris79_pipeline_en_5.5.0_3.0_1726457552561.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_polaris79_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_polaris79_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_polaris79_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|509.0 MB| + +## References + +https://huggingface.co/polaris79/opus-mt-en-ro-finetuned-en-to-ro + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp_en.md new file mode 100644 index 00000000000000..c50acd4220925a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp MarianTransformer from yeshanp +author: John Snow Labs +name: opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp` is a English model originally trained by yeshanp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp_en_5.5.0_3.0_1726509926502.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp_en_5.5.0_3.0_1726509926502.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_romanian_finetuned_english_tonga_tonga_islands_romanian_yeshanp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.6 MB| + +## References + +https://huggingface.co/yeshanp/opus-mt-en-ro-finetuned-en-to-ro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_maq_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_maq_v2_pipeline_en.md new file mode 100644 index 00000000000000..f5c354c1091b6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_maq_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_maq_v2_pipeline pipeline MarianTransformer from mekjr1 +author: John Snow Labs +name: opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_maq_v2_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_maq_v2_pipeline` is a English model originally trained by mekjr1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_maq_v2_pipeline_en_5.5.0_3.0_1726457504703.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_maq_v2_pipeline_en_5.5.0_3.0_1726457504703.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_maq_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_maq_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_spanish_finetuned_spanish_tonga_tonga_islands_maq_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|540.4 MB| + +## References + +https://huggingface.co/mekjr1/opus-mt-en-es-finetuned-es-to-maq-v2 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2_en.md b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2_en.md new file mode 100644 index 00000000000000..4484448c208065 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2 MarianTransformer from Culmenus +author: John Snow Labs +name: opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2 +date: 2024-09-16 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2` is a English model originally trained by Culmenus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2_en_5.5.0_3.0_1726465759767.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2_en_5.5.0_3.0_1726465759767.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_german_icelandic_finetuned_german_tonga_tonga_islands_icelandic_nr2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.5 MB| + +## References + +https://huggingface.co/Culmenus/opus-mt-de-is-finetuned-de-to-is_nr2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-personal_whisper_medium_english_model_en.md b/docs/_posts/ahmedlone127/2024-09-16-personal_whisper_medium_english_model_en.md new file mode 100644 index 00000000000000..e7c64feaabb96c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-personal_whisper_medium_english_model_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English personal_whisper_medium_english_model WhisperForCTC from fractalego +author: John Snow Labs +name: personal_whisper_medium_english_model +date: 2024-09-16 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`personal_whisper_medium_english_model` is a English model originally trained by fractalego. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/personal_whisper_medium_english_model_en_5.5.0_3.0_1726481837719.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/personal_whisper_medium_english_model_en_5.5.0_3.0_1726481837719.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("personal_whisper_medium_english_model","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("personal_whisper_medium_english_model", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|personal_whisper_medium_english_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/fractalego/personal-whisper-medium.en-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-platzi_distilroberta_base_mrpc_glue_rafa_rivera_en.md b/docs/_posts/ahmedlone127/2024-09-16-platzi_distilroberta_base_mrpc_glue_rafa_rivera_en.md new file mode 100644 index 00000000000000..0d556261d2ed3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-platzi_distilroberta_base_mrpc_glue_rafa_rivera_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English platzi_distilroberta_base_mrpc_glue_rafa_rivera RoBertaForSequenceClassification from platzi +author: John Snow Labs +name: platzi_distilroberta_base_mrpc_glue_rafa_rivera +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_mrpc_glue_rafa_rivera` is a English model originally trained by platzi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_rafa_rivera_en_5.5.0_3.0_1726518905897.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_rafa_rivera_en_5.5.0_3.0_1726518905897.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_rafa_rivera","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_rafa_rivera", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_mrpc_glue_rafa_rivera| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/platzi/platzi-distilroberta-base-mrpc-glue-rafa-rivera \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_base_squad_model1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_squad_model1_pipeline_en.md new file mode 100644 index 00000000000000..986c2de54de8b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_squad_model1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_base_squad_model1_pipeline pipeline RoBertaForQuestionAnswering from varun-v-rao +author: John Snow Labs +name: roberta_base_squad_model1_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_squad_model1_pipeline` is a English model originally trained by varun-v-rao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_squad_model1_pipeline_en_5.5.0_3.0_1726460724456.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_squad_model1_pipeline_en_5.5.0_3.0_1726460724456.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_squad_model1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_squad_model1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_squad_model1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|461.9 MB| + +## References + +https://huggingface.co/varun-v-rao/roberta-base-squad-model1 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_base_tweet_topic_single_2020_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_tweet_topic_single_2020_en.md new file mode 100644 index 00000000000000..a116356ed68ce4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_base_tweet_topic_single_2020_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_tweet_topic_single_2020 RoBertaForSequenceClassification from cardiffnlp +author: John Snow Labs +name: roberta_base_tweet_topic_single_2020 +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_tweet_topic_single_2020` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_tweet_topic_single_2020_en_5.5.0_3.0_1726518139101.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_tweet_topic_single_2020_en_5.5.0_3.0_1726518139101.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_tweet_topic_single_2020","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_tweet_topic_single_2020", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_tweet_topic_single_2020| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|436.4 MB| + +## References + +https://huggingface.co/cardiffnlp/roberta-base-tweet-topic-single-2020 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-roberta_finetuned_subjqa_movies_2_mohamed13579_en.md b/docs/_posts/ahmedlone127/2024-09-16-roberta_finetuned_subjqa_movies_2_mohamed13579_en.md new file mode 100644 index 00000000000000..7a2168c8e66ce4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-roberta_finetuned_subjqa_movies_2_mohamed13579_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_finetuned_subjqa_movies_2_mohamed13579 RoBertaForQuestionAnswering from mohamed13579 +author: John Snow Labs +name: roberta_finetuned_subjqa_movies_2_mohamed13579 +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_subjqa_movies_2_mohamed13579` is a English model originally trained by mohamed13579. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_mohamed13579_en_5.5.0_3.0_1726460339490.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_mohamed13579_en_5.5.0_3.0_1726460339490.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_finetuned_subjqa_movies_2_mohamed13579","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_finetuned_subjqa_movies_2_mohamed13579", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_subjqa_movies_2_mohamed13579| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/mohamed13579/roberta-finetuned-subjqa-movies_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-vietnamese_whisper_small_en.md b/docs/_posts/ahmedlone127/2024-09-16-vietnamese_whisper_small_en.md new file mode 100644 index 00000000000000..69b164b9346f1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-vietnamese_whisper_small_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English vietnamese_whisper_small WhisperForCTC from DuyTa +author: John Snow Labs +name: vietnamese_whisper_small +date: 2024-09-16 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`vietnamese_whisper_small` is a English model originally trained by DuyTa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/vietnamese_whisper_small_en_5.5.0_3.0_1726485857648.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/vietnamese_whisper_small_en_5.5.0_3.0_1726485857648.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("vietnamese_whisper_small","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("vietnamese_whisper_small", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|vietnamese_whisper_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/DuyTa/vi_whisper-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_medium_english_atco2_asr_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_medium_english_atco2_asr_en.md new file mode 100644 index 00000000000000..ebff425ae4636b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_medium_english_atco2_asr_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_medium_english_atco2_asr WhisperForCTC from jlvdoorn +author: John Snow Labs +name: whisper_medium_english_atco2_asr +date: 2024-09-16 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_english_atco2_asr` is a English model originally trained by jlvdoorn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_english_atco2_asr_en_5.5.0_3.0_1726481838556.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_english_atco2_asr_en_5.5.0_3.0_1726481838556.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_medium_english_atco2_asr","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_medium_english_atco2_asr", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_english_atco2_asr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/jlvdoorn/whisper-medium.en-atco2-asr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_english_us_sfedar_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_english_us_sfedar_en.md new file mode 100644 index 00000000000000..3140f5998cda9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_english_us_sfedar_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_english_us_sfedar WhisperForCTC from sfedar +author: John Snow Labs +name: whisper_tiny_english_us_sfedar +date: 2024-09-16 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_us_sfedar` is a English model originally trained by sfedar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_us_sfedar_en_5.5.0_3.0_1726478952211.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_us_sfedar_en_5.5.0_3.0_1726478952211.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_english_us_sfedar","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_english_us_sfedar", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_us_sfedar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/sfedar/whisper-tiny-en-US \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_v4_small_en.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_v4_small_en.md new file mode 100644 index 00000000000000..530cb2a04d5e0e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_v4_small_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_v4_small WhisperForCTC from karinthommen +author: John Snow Labs +name: whisper_v4_small +date: 2024-09-16 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_v4_small` is a English model originally trained by karinthommen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_v4_small_en_5.5.0_3.0_1726488189979.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_v4_small_en_5.5.0_3.0_1726488189979.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_v4_small","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_v4_small", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_v4_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/karinthommen/whisper-V4-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_all_jhagege_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_all_jhagege_en.md new file mode 100644 index 00000000000000..6707c432b17d25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_all_jhagege_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_jhagege XlmRoBertaForTokenClassification from jhagege +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_jhagege +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_jhagege` is a English model originally trained by jhagege. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_jhagege_en_5.5.0_3.0_1726495880745.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_jhagege_en_5.5.0_3.0_1726495880745.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_jhagege","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_jhagege", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_jhagege| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|859.8 MB| + +## References + +https://huggingface.co/jhagege/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_english_param_mehta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_english_param_mehta_pipeline_en.md new file mode 100644 index 00000000000000..074980dcd18426 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_english_param_mehta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_param_mehta_pipeline pipeline XlmRoBertaForTokenClassification from param-mehta +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_param_mehta_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_param_mehta_pipeline` is a English model originally trained by param-mehta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_param_mehta_pipeline_en_5.5.0_3.0_1726495548234.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_param_mehta_pipeline_en_5.5.0_3.0_1726495548234.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_param_mehta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_param_mehta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_param_mehta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|817.2 MB| + +## References + +https://huggingface.co/param-mehta/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_brouwer_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_brouwer_en.md new file mode 100644 index 00000000000000..045b77cec1fbae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_brouwer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_brouwer XlmRoBertaForTokenClassification from brouwer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_brouwer +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_brouwer` is a English model originally trained by brouwer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_brouwer_en_5.5.0_3.0_1726495909340.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_brouwer_en_5.5.0_3.0_1726495909340.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_brouwer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_brouwer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_brouwer| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|574.7 MB| + +## References + +https://huggingface.co/brouwer/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_tyayoi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_tyayoi_pipeline_en.md new file mode 100644 index 00000000000000..1c197a59f9a20e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_tyayoi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_tyayoi_pipeline pipeline XlmRoBertaForTokenClassification from tyayoi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_tyayoi_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_tyayoi_pipeline` is a English model originally trained by tyayoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_tyayoi_pipeline_en_5.5.0_3.0_1726495659438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_tyayoi_pipeline_en_5.5.0_3.0_1726495659438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_tyayoi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_tyayoi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_tyayoi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/tyayoi/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_nadle_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_nadle_pipeline_en.md new file mode 100644 index 00000000000000..9da8884e7c7c6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_nadle_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_nadle_pipeline pipeline XlmRoBertaForTokenClassification from nadle +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_nadle_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_nadle_pipeline` is a English model originally trained by nadle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_nadle_pipeline_en_5.5.0_3.0_1726495342585.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_nadle_pipeline_en_5.5.0_3.0_1726495342585.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_nadle_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_nadle_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_nadle_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.9 MB| + +## References + +https://huggingface.co/nadle/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-09_distilbert_qa_pytorch_full_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-09_distilbert_qa_pytorch_full_pipeline_en.md new file mode 100644 index 00000000000000..8194d8d3de642d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-09_distilbert_qa_pytorch_full_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English 09_distilbert_qa_pytorch_full_pipeline pipeline DistilBertForQuestionAnswering from tyavika +author: John Snow Labs +name: 09_distilbert_qa_pytorch_full_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`09_distilbert_qa_pytorch_full_pipeline` is a English model originally trained by tyavika. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/09_distilbert_qa_pytorch_full_pipeline_en_5.5.0_3.0_1726574960075.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/09_distilbert_qa_pytorch_full_pipeline_en_5.5.0_3.0_1726574960075.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("09_distilbert_qa_pytorch_full_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("09_distilbert_qa_pytorch_full_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|09_distilbert_qa_pytorch_full_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/tyavika/09-Distilbert-QA-Pytorch-FULL + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-argument_classification_ibm_spanish_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-argument_classification_ibm_spanish_roberta_pipeline_en.md new file mode 100644 index 00000000000000..c2ff2fb37dd04b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-argument_classification_ibm_spanish_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English argument_classification_ibm_spanish_roberta_pipeline pipeline RoBertaForSequenceClassification from anhuu +author: John Snow Labs +name: argument_classification_ibm_spanish_roberta_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`argument_classification_ibm_spanish_roberta_pipeline` is a English model originally trained by anhuu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/argument_classification_ibm_spanish_roberta_pipeline_en_5.5.0_3.0_1726591007566.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/argument_classification_ibm_spanish_roberta_pipeline_en_5.5.0_3.0_1726591007566.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("argument_classification_ibm_spanish_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("argument_classification_ibm_spanish_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|argument_classification_ibm_spanish_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|444.7 MB| + +## References + +https://huggingface.co/anhuu/argument_classification_ibm_es_roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bert_base_squad_v1_1_portuguese_ibama_v0_220240904175650_en.md b/docs/_posts/ahmedlone127/2024-09-17-bert_base_squad_v1_1_portuguese_ibama_v0_220240904175650_en.md new file mode 100644 index 00000000000000..cb510d49276de5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bert_base_squad_v1_1_portuguese_ibama_v0_220240904175650_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_220240904175650 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_220240904175650 +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_220240904175650` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_220240904175650_en_5.5.0_3.0_1726567378845.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_220240904175650_en_5.5.0_3.0_1726567378845.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_220240904175650","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_220240904175650", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_220240904175650| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.220240904175650 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_false_fh_false_hs_700_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_false_fh_false_hs_700_pipeline_en.md new file mode 100644 index 00000000000000..25609a82d0bdb1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_false_fh_false_hs_700_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_false_fh_false_hs_700_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_false_fh_false_hs_700_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_false_fh_false_hs_700_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_false_fh_false_hs_700_pipeline_en_5.5.0_3.0_1726567469175.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_false_fh_false_hs_700_pipeline_en_5.5.0_3.0_1726567469175.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_false_fh_false_hs_700_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_false_fh_false_hs_700_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_2_69_b_32_lr_4e_06_dp_0_1_swati_0_southern_sotho_false_fh_false_hs_700_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-2.69-b-32-lr-4e-06-dp-0.1-ss-0-st-False-fh-False-hs-700 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bert_classifier_sead_l_6_h_256_a_8_mrpc_en.md b/docs/_posts/ahmedlone127/2024-09-17-bert_classifier_sead_l_6_h_256_a_8_mrpc_en.md new file mode 100644 index 00000000000000..43250338458527 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bert_classifier_sead_l_6_h_256_a_8_mrpc_en.md @@ -0,0 +1,111 @@ +--- +layout: model +title: English BertForSequenceClassification Cased model (from course5i) +author: John Snow Labs +name: bert_classifier_sead_l_6_h_256_a_8_mrpc +date: 2024-09-17 +tags: [en, open_source, bert, sequence_classification, classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `SEAD-L-6_H-256_A-8-mrpc` is a English model originally trained by `course5i`. + +## Predicted Entities + +`0`, `1` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_classifier_sead_l_6_h_256_a_8_mrpc_en_5.5.0_3.0_1726604598811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_classifier_sead_l_6_h_256_a_8_mrpc_en_5.5.0_3.0_1726604598811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +seq_classifier = BertForSequenceClassification.pretrained("bert_classifier_sead_l_6_h_256_a_8_mrpc","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("class") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, seq_classifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val seq_classifier = BertForSequenceClassification.pretrained("bert_classifier_sead_l_6_h_256_a_8_mrpc","en") + .setInputCols(Array("document", "token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, seq_classifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.classify.bert.glue.6l_256d_a8a_256d").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_classifier_sead_l_6_h_256_a_8_mrpc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|47.3 MB| + +## References + +References + +- https://huggingface.co/course5i/SEAD-L-6_H-256_A-8-mrpc +- https://arxiv.org/abs/1910.01108 +- https://arxiv.org/abs/1909.10351 +- https://arxiv.org/abs/2002.10957 +- https://arxiv.org/abs/1810.04805 +- https://arxiv.org/abs/1804.07461 +- https://arxiv.org/abs/1905.00537 +- https://www.adasci.org/journals/lattice-35309407/?volumes=true&open=621a3b18edc4364e8a96cb63 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-bsc_bio_ehr_spanish_livingner1_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-17-bsc_bio_ehr_spanish_livingner1_pipeline_es.md new file mode 100644 index 00000000000000..4ef515fb61e67d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-bsc_bio_ehr_spanish_livingner1_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_livingner1_pipeline pipeline RoBertaForTokenClassification from IIC +author: John Snow Labs +name: bsc_bio_ehr_spanish_livingner1_pipeline +date: 2024-09-17 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_livingner1_pipeline` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_livingner1_pipeline_es_5.5.0_3.0_1726538175580.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_livingner1_pipeline_es_5.5.0_3.0_1726538175580.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bsc_bio_ehr_spanish_livingner1_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bsc_bio_ehr_spanish_livingner1_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_livingner1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|437.9 MB| + +## References + +https://huggingface.co/IIC/bsc-bio-ehr-es-livingner1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_model_massiaz_en.md b/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_model_massiaz_en.md new file mode 100644 index 00000000000000..d8fcfb27cf621c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-burmese_awesome_model_massiaz_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_massiaz DistilBertForSequenceClassification from massiaz +author: John Snow Labs +name: burmese_awesome_model_massiaz +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_massiaz` is a English model originally trained by massiaz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_massiaz_en_5.5.0_3.0_1726584285388.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_massiaz_en_5.5.0_3.0_1726584285388.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_massiaz","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_massiaz", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_massiaz| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/massiaz/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-chinese_roberta_wwm_ext_2_0_8_ddp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-chinese_roberta_wwm_ext_2_0_8_ddp_pipeline_en.md new file mode 100644 index 00000000000000..b059cde926f66c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-chinese_roberta_wwm_ext_2_0_8_ddp_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English chinese_roberta_wwm_ext_2_0_8_ddp_pipeline pipeline BertForQuestionAnswering from DaydreamerF +author: John Snow Labs +name: chinese_roberta_wwm_ext_2_0_8_ddp_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chinese_roberta_wwm_ext_2_0_8_ddp_pipeline` is a English model originally trained by DaydreamerF. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chinese_roberta_wwm_ext_2_0_8_ddp_pipeline_en_5.5.0_3.0_1726544249381.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chinese_roberta_wwm_ext_2_0_8_ddp_pipeline_en_5.5.0_3.0_1726544249381.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("chinese_roberta_wwm_ext_2_0_8_ddp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("chinese_roberta_wwm_ext_2_0_8_ddp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chinese_roberta_wwm_ext_2_0_8_ddp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|381.0 MB| + +## References + +https://huggingface.co/DaydreamerF/chinese-roberta-wwm-ext-2.0-8-ddp + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-class_poems_spanish_en.md b/docs/_posts/ahmedlone127/2024-09-17-class_poems_spanish_en.md new file mode 100644 index 00000000000000..b4dd6812a2dd6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-class_poems_spanish_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English class_poems_spanish RoBertaForSequenceClassification from hackathon-pln-es +author: John Snow Labs +name: class_poems_spanish +date: 2024-09-17 +tags: [roberta, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`class_poems_spanish` is a English model originally trained by hackathon-pln-es. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/class_poems_spanish_en_5.5.0_3.0_1726574081914.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/class_poems_spanish_en_5.5.0_3.0_1726574081914.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("class_poems_spanish","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("class_poems_spanish","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|class_poems_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|435.1 MB| + +## References + +References + +https://huggingface.co/hackathon-pln-es/class-poems-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_atunass_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_atunass_pipeline_en.md new file mode 100644 index 00000000000000..13faa169968ad6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_squad_atunass_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_atunass_pipeline pipeline DistilBertForQuestionAnswering from aTunass +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_atunass_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_atunass_pipeline` is a English model originally trained by aTunass. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_atunass_pipeline_en_5.5.0_3.0_1726599987992.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_atunass_pipeline_en_5.5.0_3.0_1726599987992.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_atunass_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_atunass_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_atunass_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/aTunass/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_p80_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_p80_en.md new file mode 100644 index 00000000000000..9042d06cf75d36 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_p80_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_p80 DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_p80 +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_p80` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p80_en_5.5.0_3.0_1726599606155.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p80_en_5.5.0_3.0_1726599606155.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_p80","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_p80", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_p80| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|139.2 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-p80 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_uncased_assamese_hungarian_f1_score_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_uncased_assamese_hungarian_f1_score_en.md new file mode 100644 index 00000000000000..24e3d7a4e339cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_uncased_assamese_hungarian_f1_score_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_uncased_assamese_hungarian_f1_score DistilBertForSequenceClassification from raulgdp +author: John Snow Labs +name: distilbert_uncased_assamese_hungarian_f1_score +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_uncased_assamese_hungarian_f1_score` is a English model originally trained by raulgdp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_uncased_assamese_hungarian_f1_score_en_5.5.0_3.0_1726584852801.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_uncased_assamese_hungarian_f1_score_en_5.5.0_3.0_1726584852801.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_uncased_assamese_hungarian_f1_score","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_uncased_assamese_hungarian_f1_score", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_uncased_assamese_hungarian_f1_score| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/raulgdp/Distilbert-uncased-AS-HU-f1-score \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-dl_xlm_roberta_base10_en.md b/docs/_posts/ahmedlone127/2024-09-17-dl_xlm_roberta_base10_en.md new file mode 100644 index 00000000000000..6a80b87550f713 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-dl_xlm_roberta_base10_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dl_xlm_roberta_base10 XlmRoBertaForSequenceClassification from mohammad-osoolian +author: John Snow Labs +name: dl_xlm_roberta_base10 +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dl_xlm_roberta_base10` is a English model originally trained by mohammad-osoolian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dl_xlm_roberta_base10_en_5.5.0_3.0_1726616463735.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dl_xlm_roberta_base10_en_5.5.0_3.0_1726616463735.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("dl_xlm_roberta_base10","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("dl_xlm_roberta_base10", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dl_xlm_roberta_base10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|783.3 MB| + +## References + +https://huggingface.co/mohammad-osoolian/DL-xlm-roberta-base10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-edos_2023_baseline_xlm_roberta_base_label_category_en.md b/docs/_posts/ahmedlone127/2024-09-17-edos_2023_baseline_xlm_roberta_base_label_category_en.md new file mode 100644 index 00000000000000..44712751d8c444 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-edos_2023_baseline_xlm_roberta_base_label_category_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English edos_2023_baseline_xlm_roberta_base_label_category XlmRoBertaForSequenceClassification from lct-rug-2022 +author: John Snow Labs +name: edos_2023_baseline_xlm_roberta_base_label_category +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`edos_2023_baseline_xlm_roberta_base_label_category` is a English model originally trained by lct-rug-2022. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/edos_2023_baseline_xlm_roberta_base_label_category_en_5.5.0_3.0_1726616482492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/edos_2023_baseline_xlm_roberta_base_label_category_en_5.5.0_3.0_1726616482492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("edos_2023_baseline_xlm_roberta_base_label_category","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("edos_2023_baseline_xlm_roberta_base_label_category", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|edos_2023_baseline_xlm_roberta_base_label_category| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|654.5 MB| + +## References + +https://huggingface.co/lct-rug-2022/edos-2023-baseline-xlm-roberta-base-label_category \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-finetuning_sentiment_model_3000_samples_hunnyopenxcell_en.md b/docs/_posts/ahmedlone127/2024-09-17-finetuning_sentiment_model_3000_samples_hunnyopenxcell_en.md new file mode 100644 index 00000000000000..f31bfb194d2ce6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-finetuning_sentiment_model_3000_samples_hunnyopenxcell_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_hunnyopenxcell DistilBertForSequenceClassification from hunnyopenxcell +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_hunnyopenxcell +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_hunnyopenxcell` is a English model originally trained by hunnyopenxcell. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_hunnyopenxcell_en_5.5.0_3.0_1726584360531.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_hunnyopenxcell_en_5.5.0_3.0_1726584360531.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_hunnyopenxcell","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_hunnyopenxcell", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_hunnyopenxcell| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hunnyopenxcell/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-ft_mod_all_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-ft_mod_all_pipeline_en.md new file mode 100644 index 00000000000000..1cb14b3affdfec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-ft_mod_all_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English ft_mod_all_pipeline pipeline DistilBertForQuestionAnswering from Saty2hoty +author: John Snow Labs +name: ft_mod_all_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ft_mod_all_pipeline` is a English model originally trained by Saty2hoty. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ft_mod_all_pipeline_en_5.5.0_3.0_1726586397621.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ft_mod_all_pipeline_en_5.5.0_3.0_1726586397621.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ft_mod_all_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ft_mod_all_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ft_mod_all_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Saty2hoty/FT_mod_all + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-influence_tactic_paper_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-influence_tactic_paper_pipeline_en.md new file mode 100644 index 00000000000000..30387790777856 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-influence_tactic_paper_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English influence_tactic_paper_pipeline pipeline BertForSequenceClassification from InfluenceTactics +author: John Snow Labs +name: influence_tactic_paper_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`influence_tactic_paper_pipeline` is a English model originally trained by InfluenceTactics. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/influence_tactic_paper_pipeline_en_5.5.0_3.0_1726604789831.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/influence_tactic_paper_pipeline_en_5.5.0_3.0_1726604789831.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("influence_tactic_paper_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("influence_tactic_paper_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|influence_tactic_paper_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/InfluenceTactics/Influence_Tactic_Paper + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-korean_better_old_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-korean_better_old_pipeline_en.md new file mode 100644 index 00000000000000..77d3c8cd1c4697 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-korean_better_old_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English korean_better_old_pipeline pipeline WhisperForCTC from phucd +author: John Snow Labs +name: korean_better_old_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`korean_better_old_pipeline` is a English model originally trained by phucd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/korean_better_old_pipeline_en_5.5.0_3.0_1726570550146.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/korean_better_old_pipeline_en_5.5.0_3.0_1726570550146.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("korean_better_old_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("korean_better_old_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|korean_better_old_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/phucd/ko-better-old + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-lab1_finetuning_yimeiyang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-lab1_finetuning_yimeiyang_pipeline_en.md new file mode 100644 index 00000000000000..cf88b6de7c3193 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-lab1_finetuning_yimeiyang_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lab1_finetuning_yimeiyang_pipeline pipeline MarianTransformer from yimeiyang +author: John Snow Labs +name: lab1_finetuning_yimeiyang_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab1_finetuning_yimeiyang_pipeline` is a English model originally trained by yimeiyang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab1_finetuning_yimeiyang_pipeline_en_5.5.0_3.0_1726582225303.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab1_finetuning_yimeiyang_pipeline_en_5.5.0_3.0_1726582225303.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lab1_finetuning_yimeiyang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lab1_finetuning_yimeiyang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab1_finetuning_yimeiyang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.7 MB| + +## References + +https://huggingface.co/yimeiyang/lab1_finetuning + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-lab2_adam_reshphil23_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-lab2_adam_reshphil23_pipeline_en.md new file mode 100644 index 00000000000000..88b5e610a0ac56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-lab2_adam_reshphil23_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lab2_adam_reshphil23_pipeline pipeline MarianTransformer from reshphil23 +author: John Snow Labs +name: lab2_adam_reshphil23_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab2_adam_reshphil23_pipeline` is a English model originally trained by reshphil23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab2_adam_reshphil23_pipeline_en_5.5.0_3.0_1726533352218.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab2_adam_reshphil23_pipeline_en_5.5.0_3.0_1726533352218.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lab2_adam_reshphil23_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lab2_adam_reshphil23_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab2_adam_reshphil23_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.9 MB| + +## References + +https://huggingface.co/reshphil23/lab2_adam + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-maltese_hitz_spanish_basque_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-17-maltese_hitz_spanish_basque_pipeline_es.md new file mode 100644 index 00000000000000..db92373f0437c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-maltese_hitz_spanish_basque_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish maltese_hitz_spanish_basque_pipeline pipeline MarianTransformer from HiTZ +author: John Snow Labs +name: maltese_hitz_spanish_basque_pipeline +date: 2024-09-17 +tags: [es, open_source, pipeline, onnx] +task: Translation +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`maltese_hitz_spanish_basque_pipeline` is a Castilian, Spanish model originally trained by HiTZ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/maltese_hitz_spanish_basque_pipeline_es_5.5.0_3.0_1726581748715.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/maltese_hitz_spanish_basque_pipeline_es_5.5.0_3.0_1726581748715.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("maltese_hitz_spanish_basque_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("maltese_hitz_spanish_basque_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|maltese_hitz_spanish_basque_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|225.9 MB| + +## References + +https://huggingface.co/HiTZ/mt-hitz-es-eu + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-nlp_herbalmultilabelclassification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-nlp_herbalmultilabelclassification_pipeline_en.md new file mode 100644 index 00000000000000..51d96d4f78abbc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-nlp_herbalmultilabelclassification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nlp_herbalmultilabelclassification_pipeline pipeline DistilBertForSequenceClassification from khygopole +author: John Snow Labs +name: nlp_herbalmultilabelclassification_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_herbalmultilabelclassification_pipeline` is a English model originally trained by khygopole. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_herbalmultilabelclassification_pipeline_en_5.5.0_3.0_1726584184084.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_herbalmultilabelclassification_pipeline_en_5.5.0_3.0_1726584184084.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nlp_herbalmultilabelclassification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nlp_herbalmultilabelclassification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_herbalmultilabelclassification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/khygopole/NLP_HerbalMultilabelClassification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_azerbaijani_english_finetuned_npomo_english_10_epochs_en.md b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_azerbaijani_english_finetuned_npomo_english_10_epochs_en.md new file mode 100644 index 00000000000000..7f9c650feb25a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_azerbaijani_english_finetuned_npomo_english_10_epochs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_azerbaijani_english_finetuned_npomo_english_10_epochs MarianTransformer from UnassumingOwl +author: John Snow Labs +name: opus_maltese_azerbaijani_english_finetuned_npomo_english_10_epochs +date: 2024-09-17 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_azerbaijani_english_finetuned_npomo_english_10_epochs` is a English model originally trained by UnassumingOwl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_azerbaijani_english_finetuned_npomo_english_10_epochs_en_5.5.0_3.0_1726532810812.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_azerbaijani_english_finetuned_npomo_english_10_epochs_en_5.5.0_3.0_1726532810812.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_azerbaijani_english_finetuned_npomo_english_10_epochs","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_azerbaijani_english_finetuned_npomo_english_10_epochs","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_azerbaijani_english_finetuned_npomo_english_10_epochs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|301.2 MB| + +## References + +https://huggingface.co/UnassumingOwl/opus-mt-az-en-finetuned-npomo-en-10-epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_en.md b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_en.md new file mode 100644 index 00000000000000..8236c47b240219 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1 MarianTransformer from maaaaaa1 +author: John Snow Labs +name: opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1 +date: 2024-09-17 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1` is a English model originally trained by maaaaaa1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_en_5.5.0_3.0_1726533333436.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_en_5.5.0_3.0_1726533333436.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|540.0 MB| + +## References + +https://huggingface.co/maaaaaa1/opus-mt-en-es-finetuned-en-to-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-pft_clf_finetuned_pipeline_fa.md b/docs/_posts/ahmedlone127/2024-09-17-pft_clf_finetuned_pipeline_fa.md new file mode 100644 index 00000000000000..b42d64771f582e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-pft_clf_finetuned_pipeline_fa.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Persian pft_clf_finetuned_pipeline pipeline BertForSequenceClassification from amirhossein1376 +author: John Snow Labs +name: pft_clf_finetuned_pipeline +date: 2024-09-17 +tags: [fa, open_source, pipeline, onnx] +task: Text Classification +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pft_clf_finetuned_pipeline` is a Persian model originally trained by amirhossein1376. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pft_clf_finetuned_pipeline_fa_5.5.0_3.0_1726604787693.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pft_clf_finetuned_pipeline_fa_5.5.0_3.0_1726604787693.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pft_clf_finetuned_pipeline", lang = "fa") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pft_clf_finetuned_pipeline", lang = "fa") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pft_clf_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fa| +|Size:|443.9 MB| + +## References + +https://huggingface.co/amirhossein1376/pft-clf-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-pipeline1model3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-pipeline1model3_pipeline_en.md new file mode 100644 index 00000000000000..522f3a2edeadad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-pipeline1model3_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English pipeline1model3_pipeline pipeline WhisperForCTC from avery0 +author: John Snow Labs +name: pipeline1model3_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pipeline1model3_pipeline` is a English model originally trained by avery0. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pipeline1model3_pipeline_en_5.5.0_3.0_1726563527611.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pipeline1model3_pipeline_en_5.5.0_3.0_1726563527611.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pipeline1model3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pipeline1model3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pipeline1model3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|394.0 MB| + +## References + +https://huggingface.co/avery0/pipeline1model3 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-psthebert_en.md b/docs/_posts/ahmedlone127/2024-09-17-psthebert_en.md new file mode 100644 index 00000000000000..a6abd1479b5334 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-psthebert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English psthebert RoBertaForSequenceClassification from eevvgg +author: John Snow Labs +name: psthebert +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`psthebert` is a English model originally trained by eevvgg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/psthebert_en_5.5.0_3.0_1726591283485.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/psthebert_en_5.5.0_3.0_1726591283485.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("psthebert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("psthebert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|psthebert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.8 MB| + +## References + +https://huggingface.co/eevvgg/PsTheBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-quac_qa_bert_srddev_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-quac_qa_bert_srddev_pipeline_en.md new file mode 100644 index 00000000000000..5ce24e0c31999e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-quac_qa_bert_srddev_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English quac_qa_bert_srddev_pipeline pipeline BertForQuestionAnswering from SRDdev +author: John Snow Labs +name: quac_qa_bert_srddev_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`quac_qa_bert_srddev_pipeline` is a English model originally trained by SRDdev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/quac_qa_bert_srddev_pipeline_en_5.5.0_3.0_1726554338784.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/quac_qa_bert_srddev_pipeline_en_5.5.0_3.0_1726554338784.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("quac_qa_bert_srddev_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("quac_qa_bert_srddev_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|quac_qa_bert_srddev_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/SRDdev/QuAC-QA-BERT + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-question_answearing_7_distillbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-question_answearing_7_distillbert_pipeline_en.md new file mode 100644 index 00000000000000..3d6ce441d071ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-question_answearing_7_distillbert_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English question_answearing_7_distillbert_pipeline pipeline DistilBertForQuestionAnswering from Meziane +author: John Snow Labs +name: question_answearing_7_distillbert_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`question_answearing_7_distillbert_pipeline` is a English model originally trained by Meziane. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/question_answearing_7_distillbert_pipeline_en_5.5.0_3.0_1726586700692.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/question_answearing_7_distillbert_pipeline_en_5.5.0_3.0_1726586700692.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("question_answearing_7_distillbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("question_answearing_7_distillbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|question_answearing_7_distillbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Meziane/question_answearing_7_distillbert + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-robbert_2023_dutch_large_ft_lcn_actua_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-robbert_2023_dutch_large_ft_lcn_actua_pipeline_en.md new file mode 100644 index 00000000000000..342eec1fbe2b84 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-robbert_2023_dutch_large_ft_lcn_actua_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robbert_2023_dutch_large_ft_lcn_actua_pipeline pipeline RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: robbert_2023_dutch_large_ft_lcn_actua_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_2023_dutch_large_ft_lcn_actua_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_2023_dutch_large_ft_lcn_actua_pipeline_en_5.5.0_3.0_1726603075725.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_2023_dutch_large_ft_lcn_actua_pipeline_en_5.5.0_3.0_1726603075725.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robbert_2023_dutch_large_ft_lcn_actua_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robbert_2023_dutch_large_ft_lcn_actua_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_2023_dutch_large_ft_lcn_actua_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/btamm12/robbert-2023-dutch-large-ft-lcn-actua + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-roberta_base_finetuned_wallisian_manual_1ep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_finetuned_wallisian_manual_1ep_pipeline_en.md new file mode 100644 index 00000000000000..1deed2bad5b489 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-roberta_base_finetuned_wallisian_manual_1ep_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_manual_1ep_pipeline pipeline RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_manual_1ep_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_manual_1ep_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_1ep_pipeline_en_5.5.0_3.0_1726602873252.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_1ep_pipeline_en_5.5.0_3.0_1726602873252.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_wallisian_manual_1ep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_wallisian_manual_1ep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_manual_1ep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.0 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-manual-1ep + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-sent_dictabert_tiny_pipeline_he.md b/docs/_posts/ahmedlone127/2024-09-17-sent_dictabert_tiny_pipeline_he.md new file mode 100644 index 00000000000000..2c50e7e01398ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-sent_dictabert_tiny_pipeline_he.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Hebrew sent_dictabert_tiny_pipeline pipeline BertSentenceEmbeddings from dicta-il +author: John Snow Labs +name: sent_dictabert_tiny_pipeline +date: 2024-09-17 +tags: [he, open_source, pipeline, onnx] +task: Embeddings +language: he +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_dictabert_tiny_pipeline` is a Hebrew model originally trained by dicta-il. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_dictabert_tiny_pipeline_he_5.5.0_3.0_1726587222694.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_dictabert_tiny_pipeline_he_5.5.0_3.0_1726587222694.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_dictabert_tiny_pipeline", lang = "he") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_dictabert_tiny_pipeline", lang = "he") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_dictabert_tiny_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|he| +|Size:|108.9 MB| + +## References + +https://huggingface.co/dicta-il/dictabert-tiny + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-tinymax_en.md b/docs/_posts/ahmedlone127/2024-09-17-tinymax_en.md new file mode 100644 index 00000000000000..b9a15ff9f48d22 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-tinymax_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English tinymax WhisperForCTC from tabsadem +author: John Snow Labs +name: tinymax +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tinymax` is a English model originally trained by tabsadem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tinymax_en_5.5.0_3.0_1726549971269.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tinymax_en_5.5.0_3.0_1726549971269.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("tinymax","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("tinymax", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tinymax| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|392.0 MB| + +## References + +https://huggingface.co/tabsadem/tinyMAX \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_base_encod_vietmed_pipeline_vi.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_encod_vietmed_pipeline_vi.md new file mode 100644 index 00000000000000..febae1fb4cebad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_encod_vietmed_pipeline_vi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Vietnamese whisper_base_encod_vietmed_pipeline pipeline WhisperForCTC from Hanhpt23 +author: John Snow Labs +name: whisper_base_encod_vietmed_pipeline +date: 2024-09-17 +tags: [vi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: vi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_encod_vietmed_pipeline` is a Vietnamese model originally trained by Hanhpt23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_encod_vietmed_pipeline_vi_5.5.0_3.0_1726547805888.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_encod_vietmed_pipeline_vi_5.5.0_3.0_1726547805888.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_encod_vietmed_pipeline", lang = "vi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_encod_vietmed_pipeline", lang = "vi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_encod_vietmed_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|vi| +|Size:|641.7 MB| + +## References + +https://huggingface.co/Hanhpt23/whisper-base-Encod-vietmed + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_xhosa_za_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_xhosa_za_pipeline_en.md new file mode 100644 index 00000000000000..64413d8ab68f6b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_xhosa_za_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_xhosa_za_pipeline pipeline WhisperForCTC from julie200 +author: John Snow Labs +name: whisper_small_xhosa_za_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_xhosa_za_pipeline` is a English model originally trained by julie200. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_xhosa_za_pipeline_en_5.5.0_3.0_1726550925273.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_xhosa_za_pipeline_en_5.5.0_3.0_1726550925273.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_xhosa_za_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_xhosa_za_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_xhosa_za_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/julie200/whisper-small-xh_za + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_engmed_v2_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_engmed_v2_en.md new file mode 100644 index 00000000000000..2f80cde9442c0f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_tiny_engmed_v2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_engmed_v2 WhisperForCTC from Hanhpt23 +author: John Snow Labs +name: whisper_tiny_engmed_v2 +date: 2024-09-17 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_engmed_v2` is a English model originally trained by Hanhpt23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_engmed_v2_en_5.5.0_3.0_1726547743868.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_engmed_v2_en_5.5.0_3.0_1726547743868.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_engmed_v2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_engmed_v2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_engmed_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|378.0 MB| + +## References + +https://huggingface.co/Hanhpt23/whisper-tiny-engmed-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-wikidata_researchers_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-17-wikidata_researchers_classifier_en.md new file mode 100644 index 00000000000000..724165f0f13991 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-wikidata_researchers_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English wikidata_researchers_classifier RoBertaForSequenceClassification from matthewleechen +author: John Snow Labs +name: wikidata_researchers_classifier +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wikidata_researchers_classifier` is a English model originally trained by matthewleechen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wikidata_researchers_classifier_en_5.5.0_3.0_1726573797690.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wikidata_researchers_classifier_en_5.5.0_3.0_1726573797690.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("wikidata_researchers_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("wikidata_researchers_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wikidata_researchers_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/matthewleechen/wikidata_researchers_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_all_param_mehta_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_all_param_mehta_en.md new file mode 100644 index 00000000000000..04f1dadc06d23e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_all_param_mehta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_param_mehta XlmRoBertaForTokenClassification from param-mehta +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_param_mehta +date: 2024-09-17 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_param_mehta` is a English model originally trained by param-mehta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_param_mehta_en_5.5.0_3.0_1726577051273.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_param_mehta_en_5.5.0_3.0_1726577051273.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_param_mehta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_param_mehta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_param_mehta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|849.3 MB| + +## References + +https://huggingface.co/param-mehta/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_all_smilingface88_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_all_smilingface88_en.md new file mode 100644 index 00000000000000..93b8e381860cbc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_all_smilingface88_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_smilingface88 XlmRoBertaForTokenClassification from smilingface88 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_smilingface88 +date: 2024-09-17 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_smilingface88` is a English model originally trained by smilingface88. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_smilingface88_en_5.5.0_3.0_1726611287054.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_smilingface88_en_5.5.0_3.0_1726611287054.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_smilingface88","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_smilingface88", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_smilingface88| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/smilingface88/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_german_gulermuslim_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_german_gulermuslim_en.md new file mode 100644 index 00000000000000..e96d5c1f509ca2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_german_gulermuslim_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_gulermuslim XlmRoBertaForTokenClassification from gulermuslim +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_gulermuslim +date: 2024-09-17 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_gulermuslim` is a English model originally trained by gulermuslim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_gulermuslim_en_5.5.0_3.0_1726612115887.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_gulermuslim_en_5.5.0_3.0_1726612115887.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_gulermuslim","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_gulermuslim", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_gulermuslim| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/gulermuslim/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_mongolian_ner_pipeline_mn.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_mongolian_ner_pipeline_mn.md new file mode 100644 index 00000000000000..217abc2f131b2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_mongolian_ner_pipeline_mn.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Mongolian xlm_roberta_base_mongolian_ner_pipeline pipeline XlmRoBertaForTokenClassification from nemuwn +author: John Snow Labs +name: xlm_roberta_base_mongolian_ner_pipeline +date: 2024-09-17 +tags: [mn, open_source, pipeline, onnx] +task: Named Entity Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_mongolian_ner_pipeline` is a Mongolian model originally trained by nemuwn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_mongolian_ner_pipeline_mn_5.5.0_3.0_1726576566495.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_mongolian_ner_pipeline_mn_5.5.0_3.0_1726576566495.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_mongolian_ner_pipeline", lang = "mn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_mongolian_ner_pipeline", lang = "mn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_mongolian_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mn| +|Size:|842.2 MB| + +## References + +https://huggingface.co/nemuwn/xlm-roberta-base-mongolian-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_ukraine_waray_philippines_pov_v1_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_ukraine_waray_philippines_pov_v1_en.md new file mode 100644 index 00000000000000..c69976ddab762f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_ukraine_waray_philippines_pov_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_ukraine_waray_philippines_pov_v1 XlmRoBertaForSequenceClassification from YaraKyrychenko +author: John Snow Labs +name: xlm_roberta_base_ukraine_waray_philippines_pov_v1 +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_ukraine_waray_philippines_pov_v1` is a English model originally trained by YaraKyrychenko. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ukraine_waray_philippines_pov_v1_en_5.5.0_3.0_1726536005629.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ukraine_waray_philippines_pov_v1_en_5.5.0_3.0_1726536005629.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_ukraine_waray_philippines_pov_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_ukraine_waray_philippines_pov_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_ukraine_waray_philippines_pov_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|875.0 MB| + +## References + +https://huggingface.co/YaraKyrychenko/xlm-roberta-base-ukraine-war-pov-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-5718_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-5718_5_pipeline_en.md new file mode 100644 index 00000000000000..2d40d8f60fb4ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-5718_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 5718_5_pipeline pipeline DistilBertForSequenceClassification from mhpanju +author: John Snow Labs +name: 5718_5_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`5718_5_pipeline` is a English model originally trained by mhpanju. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/5718_5_pipeline_en_5.5.0_3.0_1726695913671.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/5718_5_pipeline_en_5.5.0_3.0_1726695913671.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("5718_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("5718_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|5718_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mhpanju/5718_5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-ag_news_roberta_large_seed_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-ag_news_roberta_large_seed_3_pipeline_en.md new file mode 100644 index 00000000000000..9dbe5b21d7de48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-ag_news_roberta_large_seed_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ag_news_roberta_large_seed_3_pipeline pipeline RoBertaForSequenceClassification from utahnlp +author: John Snow Labs +name: ag_news_roberta_large_seed_3_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ag_news_roberta_large_seed_3_pipeline` is a English model originally trained by utahnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ag_news_roberta_large_seed_3_pipeline_en_5.5.0_3.0_1726628476743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ag_news_roberta_large_seed_3_pipeline_en_5.5.0_3.0_1726628476743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ag_news_roberta_large_seed_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ag_news_roberta_large_seed_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ag_news_roberta_large_seed_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/utahnlp/ag_news_roberta-large_seed-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-ai_generated_text_classification_sanalsprasad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-ai_generated_text_classification_sanalsprasad_pipeline_en.md new file mode 100644 index 00000000000000..4a5394a874dd2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-ai_generated_text_classification_sanalsprasad_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ai_generated_text_classification_sanalsprasad_pipeline pipeline RoBertaForSequenceClassification from sanalsprasad +author: John Snow Labs +name: ai_generated_text_classification_sanalsprasad_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ai_generated_text_classification_sanalsprasad_pipeline` is a English model originally trained by sanalsprasad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ai_generated_text_classification_sanalsprasad_pipeline_en_5.5.0_3.0_1726622504436.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ai_generated_text_classification_sanalsprasad_pipeline_en_5.5.0_3.0_1726622504436.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ai_generated_text_classification_sanalsprasad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ai_generated_text_classification_sanalsprasad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ai_generated_text_classification_sanalsprasad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/sanalsprasad/ai-generated-text-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-amharicnewscharacternormalizedunweighted_en.md b/docs/_posts/ahmedlone127/2024-09-18-amharicnewscharacternormalizedunweighted_en.md new file mode 100644 index 00000000000000..56132746ae30fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-amharicnewscharacternormalizedunweighted_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English amharicnewscharacternormalizedunweighted XlmRoBertaForSequenceClassification from akiseid +author: John Snow Labs +name: amharicnewscharacternormalizedunweighted +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amharicnewscharacternormalizedunweighted` is a English model originally trained by akiseid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amharicnewscharacternormalizedunweighted_en_5.5.0_3.0_1726697303669.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amharicnewscharacternormalizedunweighted_en_5.5.0_3.0_1726697303669.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("amharicnewscharacternormalizedunweighted","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("amharicnewscharacternormalizedunweighted", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amharicnewscharacternormalizedunweighted| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|840.7 MB| + +## References + +https://huggingface.co/akiseid/AmharicNewsCharacterNormalizedUnWeighted \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-arabert_arabic_ner_conllpp_ar.md b/docs/_posts/ahmedlone127/2024-09-18-arabert_arabic_ner_conllpp_ar.md new file mode 100644 index 00000000000000..77ac2b2f03a29a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-arabert_arabic_ner_conllpp_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic arabert_arabic_ner_conllpp BertForTokenClassification from MostafaAhmed98 +author: John Snow Labs +name: arabert_arabic_ner_conllpp +date: 2024-09-18 +tags: [ar, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arabert_arabic_ner_conllpp` is a Arabic model originally trained by MostafaAhmed98. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arabert_arabic_ner_conllpp_ar_5.5.0_3.0_1726673933993.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arabert_arabic_ner_conllpp_ar_5.5.0_3.0_1726673933993.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("arabert_arabic_ner_conllpp","ar") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("arabert_arabic_ner_conllpp", "ar") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arabert_arabic_ner_conllpp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|ar| +|Size:|505.1 MB| + +## References + +https://huggingface.co/MostafaAhmed98/AraBert-Arabic-NER-CoNLLpp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-arabert_arabic_ner_conllpp_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-18-arabert_arabic_ner_conllpp_pipeline_ar.md new file mode 100644 index 00000000000000..543ca033a36bbd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-arabert_arabic_ner_conllpp_pipeline_ar.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Arabic arabert_arabic_ner_conllpp_pipeline pipeline BertForTokenClassification from MostafaAhmed98 +author: John Snow Labs +name: arabert_arabic_ner_conllpp_pipeline +date: 2024-09-18 +tags: [ar, open_source, pipeline, onnx] +task: Named Entity Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arabert_arabic_ner_conllpp_pipeline` is a Arabic model originally trained by MostafaAhmed98. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arabert_arabic_ner_conllpp_pipeline_ar_5.5.0_3.0_1726673958857.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arabert_arabic_ner_conllpp_pipeline_ar_5.5.0_3.0_1726673958857.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("arabert_arabic_ner_conllpp_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("arabert_arabic_ner_conllpp_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arabert_arabic_ner_conllpp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|505.1 MB| + +## References + +https://huggingface.co/MostafaAhmed98/AraBert-Arabic-NER-CoNLLpp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-bbc_news_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-18-bbc_news_classifier_en.md new file mode 100644 index 00000000000000..bfd94224239d5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-bbc_news_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bbc_news_classifier RoBertaForSequenceClassification from chrisliu298 +author: John Snow Labs +name: bbc_news_classifier +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bbc_news_classifier` is a English model originally trained by chrisliu298. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bbc_news_classifier_en_5.5.0_3.0_1726689604579.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bbc_news_classifier_en_5.5.0_3.0_1726689604579.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("bbc_news_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("bbc_news_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bbc_news_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|451.0 MB| + +## References + +https://huggingface.co/chrisliu298/bbc_news_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-bert_base_arabert_finetuned_mdeberta_tswana_en.md b/docs/_posts/ahmedlone127/2024-09-18-bert_base_arabert_finetuned_mdeberta_tswana_en.md new file mode 100644 index 00000000000000..cb1d054818be0f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-bert_base_arabert_finetuned_mdeberta_tswana_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_arabert_finetuned_mdeberta_tswana BertEmbeddings from betteib +author: John Snow Labs +name: bert_base_arabert_finetuned_mdeberta_tswana +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_arabert_finetuned_mdeberta_tswana` is a English model originally trained by betteib. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_arabert_finetuned_mdeberta_tswana_en_5.5.0_3.0_1726700395207.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_arabert_finetuned_mdeberta_tswana_en_5.5.0_3.0_1726700395207.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_arabert_finetuned_mdeberta_tswana","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_arabert_finetuned_mdeberta_tswana","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_arabert_finetuned_mdeberta_tswana| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|504.6 MB| + +## References + +https://huggingface.co/betteib/bert-base-arabert-finetuned-mdeberta-tn \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-bert_base_uncased_finetuned_squad_summerzhang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-bert_base_uncased_finetuned_squad_summerzhang_pipeline_en.md new file mode 100644 index 00000000000000..80e560e40eb408 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-bert_base_uncased_finetuned_squad_summerzhang_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_squad_summerzhang_pipeline pipeline BertForQuestionAnswering from SummerZhang +author: John Snow Labs +name: bert_base_uncased_finetuned_squad_summerzhang_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_squad_summerzhang_pipeline` is a English model originally trained by SummerZhang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_squad_summerzhang_pipeline_en_5.5.0_3.0_1726658697869.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_squad_summerzhang_pipeline_en_5.5.0_3.0_1726658697869.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_squad_summerzhang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_squad_summerzhang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_squad_summerzhang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/SummerZhang/bert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_alexanderaziz_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_alexanderaziz_pipeline_en.md new file mode 100644 index 00000000000000..6db8293e567898 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_alexanderaziz_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_alexanderaziz_pipeline pipeline DistilBertForSequenceClassification from alexanderaziz +author: John Snow Labs +name: burmese_awesome_model_alexanderaziz_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_alexanderaziz_pipeline` is a English model originally trained by alexanderaziz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_alexanderaziz_pipeline_en_5.5.0_3.0_1726630182614.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_alexanderaziz_pipeline_en_5.5.0_3.0_1726630182614.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_alexanderaziz_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_alexanderaziz_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_alexanderaziz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/alexanderaziz/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_darkshark77_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_darkshark77_pipeline_en.md new file mode 100644 index 00000000000000..af20126a28670d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_darkshark77_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_darkshark77_pipeline pipeline DistilBertForSequenceClassification from darkshark77 +author: John Snow Labs +name: burmese_awesome_model_darkshark77_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_darkshark77_pipeline` is a English model originally trained by darkshark77. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_darkshark77_pipeline_en_5.5.0_3.0_1726695655958.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_darkshark77_pipeline_en_5.5.0_3.0_1726695655958.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_darkshark77_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_darkshark77_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_darkshark77_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/darkshark77/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_dguywill_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_dguywill_en.md new file mode 100644 index 00000000000000..bf4089a6ccc2c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_dguywill_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_dguywill DistilBertForSequenceClassification from dguywill +author: John Snow Labs +name: burmese_awesome_model_dguywill +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_dguywill` is a English model originally trained by dguywill. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_dguywill_en_5.5.0_3.0_1726694888765.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_dguywill_en_5.5.0_3.0_1726694888765.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_dguywill","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_dguywill", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_dguywill| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dguywill/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_priority_2_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_priority_2_en.md new file mode 100644 index 00000000000000..8ee2c79df6669a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_priority_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_priority_2 DistilBertForSequenceClassification from lenate +author: John Snow Labs +name: burmese_awesome_model_priority_2 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_priority_2` is a English model originally trained by lenate. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_priority_2_en_5.5.0_3.0_1726630985864.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_priority_2_en_5.5.0_3.0_1726630985864.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_priority_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_priority_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_priority_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lenate/my_awesome_model_priority_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_qa_model_ahmed13245_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_qa_model_ahmed13245_en.md new file mode 100644 index 00000000000000..1ed1e6957d4502 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_qa_model_ahmed13245_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_ahmed13245 DistilBertForQuestionAnswering from AHMED13245 +author: John Snow Labs +name: burmese_awesome_qa_model_ahmed13245 +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_ahmed13245` is a English model originally trained by AHMED13245. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_ahmed13245_en_5.5.0_3.0_1726641008931.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_ahmed13245_en_5.5.0_3.0_1726641008931.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_ahmed13245","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_ahmed13245", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_ahmed13245| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/AHMED13245/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_finetuned_distilbert_on_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_finetuned_distilbert_on_imdb_en.md new file mode 100644 index 00000000000000..e310f41c1217cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_finetuned_distilbert_on_imdb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_finetuned_distilbert_on_imdb DistilBertForSequenceClassification from gslshbs +author: John Snow Labs +name: burmese_finetuned_distilbert_on_imdb +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_finetuned_distilbert_on_imdb` is a English model originally trained by gslshbs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_finetuned_distilbert_on_imdb_en_5.5.0_3.0_1726680500836.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_finetuned_distilbert_on_imdb_en_5.5.0_3.0_1726680500836.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_finetuned_distilbert_on_imdb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_finetuned_distilbert_on_imdb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_finetuned_distilbert_on_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gslshbs/my_finetuned_DistilBERT_on_IMDb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_not_somali_awesome_model_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_not_somali_awesome_model_en.md new file mode 100644 index 00000000000000..354b5299f6efc8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_not_somali_awesome_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_not_somali_awesome_model DistilBertForSequenceClassification from baris-yazici +author: John Snow Labs +name: burmese_not_somali_awesome_model +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_not_somali_awesome_model` is a English model originally trained by baris-yazici. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_not_somali_awesome_model_en_5.5.0_3.0_1726682050486.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_not_somali_awesome_model_en_5.5.0_3.0_1726682050486.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_not_somali_awesome_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_not_somali_awesome_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_not_somali_awesome_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/baris-yazici/my_not_so_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-coha1980s_en.md b/docs/_posts/ahmedlone127/2024-09-18-coha1980s_en.md new file mode 100644 index 00000000000000..f01fd757f14ae9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-coha1980s_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English coha1980s RoBertaEmbeddings from simonmun +author: John Snow Labs +name: coha1980s +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`coha1980s` is a English model originally trained by simonmun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/coha1980s_en_5.5.0_3.0_1726678702941.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/coha1980s_en_5.5.0_3.0_1726678702941.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("coha1980s","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("coha1980s","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|coha1980s| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.4 MB| + +## References + +https://huggingface.co/simonmun/COHA1980s \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_itr15_seed3_en.md b/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_itr15_seed3_en.md new file mode 100644 index 00000000000000..20099b1a5bd1ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_itr15_seed3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cold_fusion_itr15_seed3 RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr15_seed3 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr15_seed3` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr15_seed3_en_5.5.0_3.0_1726649677590.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr15_seed3_en_5.5.0_3.0_1726649677590.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr15_seed3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr15_seed3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr15_seed3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.0 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr15-seed3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-disaster_tweets_electra_small_en.md b/docs/_posts/ahmedlone127/2024-09-18-disaster_tweets_electra_small_en.md new file mode 100644 index 00000000000000..25c6cab0205384 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-disaster_tweets_electra_small_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English disaster_tweets_electra_small RoBertaForSequenceClassification from Arsive +author: John Snow Labs +name: disaster_tweets_electra_small +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`disaster_tweets_electra_small` is a English model originally trained by Arsive. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/disaster_tweets_electra_small_en_5.5.0_3.0_1726622033084.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/disaster_tweets_electra_small_en_5.5.0_3.0_1726622033084.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("disaster_tweets_electra_small","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("disaster_tweets_electra_small", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|disaster_tweets_electra_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/Arsive/disaster_tweets_electra_small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_credit_cards_zphr_0st72_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_credit_cards_zphr_0st72_pipeline_en.md new file mode 100644 index 00000000000000..dedad29c71a9e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_credit_cards_zphr_0st72_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_credit_cards_zphr_0st72_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_credit_cards_zphr_0st72_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_credit_cards_zphr_0st72_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_credit_cards_zphr_0st72_pipeline_en_5.5.0_3.0_1726680431065.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_credit_cards_zphr_0st72_pipeline_en_5.5.0_3.0_1726680431065.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_credit_cards_zphr_0st72_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_credit_cards_zphr_0st72_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_credit_cards_zphr_0st72_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_credit_cards_zphr_0st72 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_distil_fine_on_bioasq_50_50_shuffled_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_distil_fine_on_bioasq_50_50_shuffled_pipeline_en.md new file mode 100644 index 00000000000000..f16a1da47a2a2f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_distil_fine_on_bioasq_50_50_shuffled_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_distil_fine_on_bioasq_50_50_shuffled_pipeline pipeline DistilBertForQuestionAnswering from jkhsong +author: John Snow Labs +name: distilbert_base_uncased_distil_fine_on_bioasq_50_50_shuffled_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distil_fine_on_bioasq_50_50_shuffled_pipeline` is a English model originally trained by jkhsong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distil_fine_on_bioasq_50_50_shuffled_pipeline_en_5.5.0_3.0_1726644163384.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distil_fine_on_bioasq_50_50_shuffled_pipeline_en_5.5.0_3.0_1726644163384.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_distil_fine_on_bioasq_50_50_shuffled_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_distil_fine_on_bioasq_50_50_shuffled_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distil_fine_on_bioasq_50_50_shuffled_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/jkhsong/distilbert-base-uncased-distil-fine-on-bioasq-50-50-shuffled + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_distilled_clinc_cezeozue_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_distilled_clinc_cezeozue_en.md new file mode 100644 index 00000000000000..b37ca3ee94b1c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_distilled_clinc_cezeozue_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_cezeozue DistilBertForSequenceClassification from cezeozue +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_cezeozue +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_cezeozue` is a English model originally trained by cezeozue. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_cezeozue_en_5.5.0_3.0_1726696195007.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_cezeozue_en_5.5.0_3.0_1726696195007.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_cezeozue","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_cezeozue", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_cezeozue| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/cezeozue/distilbert-base-uncased-distilled-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_distilled_clinc_seddiktrk_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_distilled_clinc_seddiktrk_en.md new file mode 100644 index 00000000000000..96c344ad337d74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_distilled_clinc_seddiktrk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_seddiktrk DistilBertForSequenceClassification from seddiktrk +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_seddiktrk +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_seddiktrk` is a English model originally trained by seddiktrk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_seddiktrk_en_5.5.0_3.0_1726681953985.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_seddiktrk_en_5.5.0_3.0_1726681953985.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_seddiktrk","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_seddiktrk", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_seddiktrk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/seddiktrk/distilbert-base-uncased-distilled-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_enneagram_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_enneagram_en.md new file mode 100644 index 00000000000000..200371e2d53711 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_enneagram_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_enneagram DistilBertForSequenceClassification from LandersonMiguel +author: John Snow Labs +name: distilbert_base_uncased_enneagram +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_enneagram` is a English model originally trained by LandersonMiguel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_enneagram_en_5.5.0_3.0_1726669975661.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_enneagram_en_5.5.0_3.0_1726669975661.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_enneagram","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_enneagram", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_enneagram| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LandersonMiguel/distilbert-base-uncased-enneagram \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_clinc_balus_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_clinc_balus_en.md new file mode 100644 index 00000000000000..f23f5749f2e355 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_clinc_balus_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_balus DistilBertForSequenceClassification from balus +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_balus +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_balus` is a English model originally trained by balus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_balus_en_5.5.0_3.0_1726696538129.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_balus_en_5.5.0_3.0_1726696538129.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_balus","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_balus", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_balus| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/balus/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_hfdsajkfd_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_hfdsajkfd_en.md new file mode 100644 index 00000000000000..1d4224c78aaccc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_hfdsajkfd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_hfdsajkfd DistilBertForSequenceClassification from hfdsajkfd +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_hfdsajkfd +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_hfdsajkfd` is a English model originally trained by hfdsajkfd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_hfdsajkfd_en_5.5.0_3.0_1726630281320.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_hfdsajkfd_en_5.5.0_3.0_1726630281320.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_hfdsajkfd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_hfdsajkfd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_hfdsajkfd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hfdsajkfd/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_whoopwhoop_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_whoopwhoop_en.md new file mode 100644 index 00000000000000..9787e58bf8487d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_whoopwhoop_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_whoopwhoop DistilBertForSequenceClassification from whoopwhoop +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_whoopwhoop +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_whoopwhoop` is a English model originally trained by whoopwhoop. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_whoopwhoop_en_5.5.0_3.0_1726695439703.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_whoopwhoop_en_5.5.0_3.0_1726695439703.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_whoopwhoop","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_whoopwhoop", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_whoopwhoop| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/whoopwhoop/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_bentanweihao_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_bentanweihao_pipeline_en.md new file mode 100644 index 00000000000000..c813f2a89f9984 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_bentanweihao_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_bentanweihao_pipeline pipeline DistilBertForSequenceClassification from bentanweihao +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_bentanweihao_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_bentanweihao_pipeline` is a English model originally trained by bentanweihao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_bentanweihao_pipeline_en_5.5.0_3.0_1726625653971.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_bentanweihao_pipeline_en_5.5.0_3.0_1726625653971.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_bentanweihao_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_bentanweihao_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_bentanweihao_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bentanweihao/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_pipeline_en.md new file mode 100644 index 00000000000000..0fe12c0da2e1f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_pipeline pipeline DistilBertForSequenceClassification from hitoshiNagaoka +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_pipeline` is a English model originally trained by hitoshiNagaoka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_pipeline_en_5.5.0_3.0_1726696034810.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_pipeline_en_5.5.0_3.0_1726696034810.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_hitoshinagaoka_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hitoshiNagaoka/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_jennifer0804_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_jennifer0804_en.md new file mode 100644 index 00000000000000..573db6dcb6b647 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_jennifer0804_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jennifer0804 DistilBertForSequenceClassification from jennifer0804 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jennifer0804 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jennifer0804` is a English model originally trained by jennifer0804. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jennifer0804_en_5.5.0_3.0_1726681404724.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jennifer0804_en_5.5.0_3.0_1726681404724.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jennifer0804","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jennifer0804", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jennifer0804| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jennifer0804/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_rick72x5_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_rick72x5_en.md new file mode 100644 index 00000000000000..75c3b16fecb6e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_rick72x5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_rick72x5 DistilBertForSequenceClassification from rick72x5 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_rick72x5 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_rick72x5` is a English model originally trained by rick72x5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_rick72x5_en_5.5.0_3.0_1726677136216.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_rick72x5_en_5.5.0_3.0_1726677136216.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_rick72x5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_rick72x5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_rick72x5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rick72x5/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_sj1011_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_sj1011_pipeline_en.md new file mode 100644 index 00000000000000..9d7cf4bbaa6795 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_sj1011_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_sj1011_pipeline pipeline DistilBertForSequenceClassification from SJ1011 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_sj1011_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_sj1011_pipeline` is a English model originally trained by SJ1011. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_sj1011_pipeline_en_5.5.0_3.0_1726680823445.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_sj1011_pipeline_en_5.5.0_3.0_1726680823445.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_sj1011_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_sj1011_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_sj1011_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/SJ1011/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_uijeong01_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_uijeong01_pipeline_en.md new file mode 100644 index 00000000000000..165289b075a61a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_uijeong01_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_uijeong01_pipeline pipeline DistilBertForSequenceClassification from uijeong01 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_uijeong01_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_uijeong01_pipeline` is a English model originally trained by uijeong01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_uijeong01_pipeline_en_5.5.0_3.0_1726669866463.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_uijeong01_pipeline_en_5.5.0_3.0_1726669866463.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_uijeong01_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_uijeong01_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_uijeong01_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/uijeong01/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_uisikdag_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_uisikdag_en.md new file mode 100644 index 00000000000000..13ba3293e2c3a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_uisikdag_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_uisikdag DistilBertForSequenceClassification from uisikdag +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_uisikdag +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_uisikdag` is a English model originally trained by uisikdag. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_uisikdag_en_5.5.0_3.0_1726696426600.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_uisikdag_en_5.5.0_3.0_1726696426600.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_uisikdag","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_uisikdag", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_uisikdag| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/uisikdag/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotions_jjwariror_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotions_jjwariror_en.md new file mode 100644 index 00000000000000..db2d7635908509 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotions_jjwariror_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotions_jjwariror DistilBertForSequenceClassification from JJWariror +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotions_jjwariror +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotions_jjwariror` is a English model originally trained by JJWariror. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_jjwariror_en_5.5.0_3.0_1726695901905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_jjwariror_en_5.5.0_3.0_1726695901905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotions_jjwariror","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotions_jjwariror", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotions_jjwariror| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/JJWariror/distilbert-base-uncased-finetuned-emotions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_m_reach_seller_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_m_reach_seller_pipeline_en.md new file mode 100644 index 00000000000000..96a3eccf9d47bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_m_reach_seller_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_m_reach_seller_pipeline pipeline DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_m_reach_seller_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_m_reach_seller_pipeline` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_m_reach_seller_pipeline_en_5.5.0_3.0_1726694873187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_m_reach_seller_pipeline_en_5.5.0_3.0_1726694873187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_m_reach_seller_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_m_reach_seller_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_m_reach_seller_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-m_reach_seller + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_cinoss_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_cinoss_en.md new file mode 100644 index 00000000000000..4dc24cead2587a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_cinoss_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_cinoss DistilBertForQuestionAnswering from cinoss +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_cinoss +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_cinoss` is a English model originally trained by cinoss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_cinoss_en_5.5.0_3.0_1726640727903.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_cinoss_en_5.5.0_3.0_1726640727903.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_cinoss","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_cinoss", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_cinoss| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/cinoss/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_pipeline_en.md new file mode 100644 index 00000000000000..c480a25899e953 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_pipeline_en_5.5.0_3.0_1726680128923.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_pipeline_en_5.5.0_3.0_1726680128923.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_kitchen_and_dining_zphr_0st72_ut52ut1_plain_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_pipeline_en.md new file mode 100644 index 00000000000000..4ddb717dcee2b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_pipeline_en_5.5.0_3.0_1726680315037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_pipeline_en_5.5.0_3.0_1726680315037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st3sd_ut72ut1_plprefix0stlarge_simsp300_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st3sd_ut72ut1_PLPrefix0stlarge_simsp300_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md new file mode 100644 index 00000000000000..1c1a061bb1909f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean200 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en_5.5.0_3.0_1726630381060.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en_5.5.0_3.0_1726630381060.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st5sd_ut72ut5_PLPrefix0stlarge_simsp100_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md new file mode 100644 index 00000000000000..57e8f83c4e63d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en_5.5.0_3.0_1726681369576.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en_5.5.0_3.0_1726681369576.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st8sd_ut72ut5_plprefix0stlarge_simsp100_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st8sd_ut72ut5_PLPrefix0stlarge_simsp100_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_lora_merged_p30_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_lora_merged_p30_pipeline_en.md new file mode 100644 index 00000000000000..dce8d95f8d946c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_squad2_lora_merged_p30_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_lora_merged_p30_pipeline pipeline DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_lora_merged_p30_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_lora_merged_p30_pipeline` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_lora_merged_p30_pipeline_en_5.5.0_3.0_1726644157355.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_lora_merged_p30_pipeline_en_5.5.0_3.0_1726644157355.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_squad2_lora_merged_p30_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_squad2_lora_merged_p30_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_lora_merged_p30_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|213.2 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-lora-merged-p30 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_text_classification_v6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_text_classification_v6_pipeline_en.md new file mode 100644 index 00000000000000..41d84a8576fbce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_text_classification_v6_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_text_classification_v6_pipeline pipeline DistilBertForSequenceClassification from arjuntheprogrammer +author: John Snow Labs +name: distilbert_base_uncased_text_classification_v6_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_text_classification_v6_pipeline` is a English model originally trained by arjuntheprogrammer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_text_classification_v6_pipeline_en_5.5.0_3.0_1726669839593.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_text_classification_v6_pipeline_en_5.5.0_3.0_1726669839593.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_text_classification_v6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_text_classification_v6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_text_classification_v6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/arjuntheprogrammer/distilbert-base-uncased-text-classification-v6 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_birads_eco_mamo_1_descartado_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_birads_eco_mamo_1_descartado_pipeline_en.md new file mode 100644 index 00000000000000..91bb86fa3e70e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_birads_eco_mamo_1_descartado_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_birads_eco_mamo_1_descartado_pipeline pipeline DistilBertForSequenceClassification from sara-m98 +author: John Snow Labs +name: distilbert_birads_eco_mamo_1_descartado_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_birads_eco_mamo_1_descartado_pipeline` is a English model originally trained by sara-m98. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_birads_eco_mamo_1_descartado_pipeline_en_5.5.0_3.0_1726676880035.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_birads_eco_mamo_1_descartado_pipeline_en_5.5.0_3.0_1726676880035.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_birads_eco_mamo_1_descartado_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_birads_eco_mamo_1_descartado_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_birads_eco_mamo_1_descartado_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sara-m98/DISTILBERT_BIRADS_ECO_MAMO_1_DESCARTADO + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_imdb_padding80model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_imdb_padding80model_pipeline_en.md new file mode 100644 index 00000000000000..7588ed499b7470 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_imdb_padding80model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_imdb_padding80model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_imdb_padding80model_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_padding80model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_padding80model_pipeline_en_5.5.0_3.0_1726680243855.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_padding80model_pipeline_en_5.5.0_3.0_1726680243855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_imdb_padding80model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_imdb_padding80model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_padding80model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_imdb_padding80model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_imdb_ranzuh_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_imdb_ranzuh_en.md new file mode 100644 index 00000000000000..5203ce11d060af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_imdb_ranzuh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_imdb_ranzuh DistilBertForSequenceClassification from ranzuh +author: John Snow Labs +name: distilbert_imdb_ranzuh +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_ranzuh` is a English model originally trained by ranzuh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_ranzuh_en_5.5.0_3.0_1726677195608.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_ranzuh_en_5.5.0_3.0_1726677195608.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_ranzuh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_ranzuh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_ranzuh| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ranzuh/distilbert-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_192_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_192_pipeline_en.md new file mode 100644 index 00000000000000..5b8799db8d5401 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_192_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_192_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_192_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_192_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_192_pipeline_en_5.5.0_3.0_1726680212615.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_192_pipeline_en_5.5.0_3.0_1726680212615.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_192_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_192_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_192_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|52.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_stsb_192 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_toxicity_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_toxicity_classification_pipeline_en.md new file mode 100644 index 00000000000000..3d57b42e914677 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_toxicity_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_toxicity_classification_pipeline pipeline DistilBertForSequenceClassification from newsmediabias +author: John Snow Labs +name: distilbert_toxicity_classification_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_toxicity_classification_pipeline` is a English model originally trained by newsmediabias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_toxicity_classification_pipeline_en_5.5.0_3.0_1726625269654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_toxicity_classification_pipeline_en_5.5.0_3.0_1726625269654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_toxicity_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_toxicity_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_toxicity_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/newsmediabias/DistilBert_Toxicity_Classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbertmultilang_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbertmultilang_en.md new file mode 100644 index 00000000000000..2e70def74f9c6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbertmultilang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbertmultilang DistilBertForSequenceClassification from baihaqy +author: John Snow Labs +name: distilbertmultilang +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbertmultilang` is a English model originally trained by baihaqy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbertmultilang_en_5.5.0_3.0_1726696128351.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbertmultilang_en_5.5.0_3.0_1726696128351.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbertmultilang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbertmultilang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbertmultilang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/baihaqy/distilbertmultilang \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_rb156k_ep40_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_rb156k_ep40_en.md new file mode 100644 index 00000000000000..2d6fecc548f9ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilroberta_base_rb156k_ep40_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_rb156k_ep40 RoBertaEmbeddings from judy93536 +author: John Snow Labs +name: distilroberta_base_rb156k_ep40 +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_rb156k_ep40` is a English model originally trained by judy93536. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_rb156k_ep40_en_5.5.0_3.0_1726626698360.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_rb156k_ep40_en_5.5.0_3.0_1726626698360.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_rb156k_ep40","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_rb156k_ep40","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_rb156k_ep40| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.0 MB| + +## References + +https://huggingface.co/judy93536/distilroberta-base-rb156k-ep40 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-emobert_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-emobert_english_pipeline_en.md new file mode 100644 index 00000000000000..b18c9281e475d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-emobert_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emobert_english_pipeline pipeline RoBertaForSequenceClassification from NLPinas +author: John Snow Labs +name: emobert_english_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emobert_english_pipeline` is a English model originally trained by NLPinas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emobert_english_pipeline_en_5.5.0_3.0_1726627525323.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emobert_english_pipeline_en_5.5.0_3.0_1726627525323.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emobert_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emobert_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emobert_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|420.8 MB| + +## References + +https://huggingface.co/NLPinas/EMoBERT-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-emotions_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-18-emotions_classifier_en.md new file mode 100644 index 00000000000000..5f3d0384300ed4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-emotions_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emotions_classifier DistilBertForSequenceClassification from XtraPatrick987 +author: John Snow Labs +name: emotions_classifier +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotions_classifier` is a English model originally trained by XtraPatrick987. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotions_classifier_en_5.5.0_3.0_1726680645783.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotions_classifier_en_5.5.0_3.0_1726680645783.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("emotions_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("emotions_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotions_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/XtraPatrick987/emotions-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-enlm_roberta_81_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-18-enlm_roberta_81_imdb_en.md new file mode 100644 index 00000000000000..79cd253660bc36 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-enlm_roberta_81_imdb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English enlm_roberta_81_imdb XlmRoBertaForSequenceClassification from manirai91 +author: John Snow Labs +name: enlm_roberta_81_imdb +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`enlm_roberta_81_imdb` is a English model originally trained by manirai91. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/enlm_roberta_81_imdb_en_5.5.0_3.0_1726633200185.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/enlm_roberta_81_imdb_en_5.5.0_3.0_1726633200185.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("enlm_roberta_81_imdb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("enlm_roberta_81_imdb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|enlm_roberta_81_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|466.6 MB| + +## References + +https://huggingface.co/manirai91/enlm-roberta-81-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-film95000roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-film95000roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..feba03ca9e279e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-film95000roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English film95000roberta_base_pipeline pipeline RoBertaEmbeddings from AmaiaSolaun +author: John Snow Labs +name: film95000roberta_base_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`film95000roberta_base_pipeline` is a English model originally trained by AmaiaSolaun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/film95000roberta_base_pipeline_en_5.5.0_3.0_1726651634136.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/film95000roberta_base_pipeline_en_5.5.0_3.0_1726651634136.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("film95000roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("film95000roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|film95000roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/AmaiaSolaun/film95000roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finace_nepal_bhasa_en.md b/docs/_posts/ahmedlone127/2024-09-18-finace_nepal_bhasa_en.md new file mode 100644 index 00000000000000..42c8e019ff3896 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finace_nepal_bhasa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finace_nepal_bhasa RoBertaForSequenceClassification from yzhangqs +author: John Snow Labs +name: finace_nepal_bhasa +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finace_nepal_bhasa` is a English model originally trained by yzhangqs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finace_nepal_bhasa_en_5.5.0_3.0_1726627910820.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finace_nepal_bhasa_en_5.5.0_3.0_1726627910820.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("finace_nepal_bhasa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("finace_nepal_bhasa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finace_nepal_bhasa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.7 MB| + +## References + +https://huggingface.co/yzhangqs/Finace_new \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finetuned_sentiment_distilbert_base_uncased_model_3000_en.md b/docs/_posts/ahmedlone127/2024-09-18-finetuned_sentiment_distilbert_base_uncased_model_3000_en.md new file mode 100644 index 00000000000000..7abe5fd23ac130 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finetuned_sentiment_distilbert_base_uncased_model_3000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_sentiment_distilbert_base_uncased_model_3000 DistilBertForSequenceClassification from iamsuman +author: John Snow Labs +name: finetuned_sentiment_distilbert_base_uncased_model_3000 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_sentiment_distilbert_base_uncased_model_3000` is a English model originally trained by iamsuman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_sentiment_distilbert_base_uncased_model_3000_en_5.5.0_3.0_1726681819603.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_sentiment_distilbert_base_uncased_model_3000_en_5.5.0_3.0_1726681819603.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_sentiment_distilbert_base_uncased_model_3000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_sentiment_distilbert_base_uncased_model_3000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_sentiment_distilbert_base_uncased_model_3000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/iamsuman/finetuned-sentiment-distilbert-base-uncased-model-3000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finetunedmodel_review_sentimentanalysis_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-finetunedmodel_review_sentimentanalysis_pipeline_en.md new file mode 100644 index 00000000000000..78beb99b827aa5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finetunedmodel_review_sentimentanalysis_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetunedmodel_review_sentimentanalysis_pipeline pipeline DistilBertForSequenceClassification from hanyundudddd +author: John Snow Labs +name: finetunedmodel_review_sentimentanalysis_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetunedmodel_review_sentimentanalysis_pipeline` is a English model originally trained by hanyundudddd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetunedmodel_review_sentimentanalysis_pipeline_en_5.5.0_3.0_1726680246117.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetunedmodel_review_sentimentanalysis_pipeline_en_5.5.0_3.0_1726680246117.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetunedmodel_review_sentimentanalysis_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetunedmodel_review_sentimentanalysis_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetunedmodel_review_sentimentanalysis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hanyundudddd/FinetunedModel_Review_SentimentAnalysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_aguinrodriguezj_en.md b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_aguinrodriguezj_en.md new file mode 100644 index 00000000000000..e37eecfe0c6416 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_aguinrodriguezj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_aguinrodriguezj DistilBertForSequenceClassification from aguinrodriguezj +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_aguinrodriguezj +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_aguinrodriguezj` is a English model originally trained by aguinrodriguezj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_aguinrodriguezj_en_5.5.0_3.0_1726696542664.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_aguinrodriguezj_en_5.5.0_3.0_1726696542664.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_aguinrodriguezj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_aguinrodriguezj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_aguinrodriguezj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aguinrodriguezj/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_sumittyagi25_en.md b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_sumittyagi25_en.md new file mode 100644 index 00000000000000..800ed8064b402d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_sumittyagi25_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_sumittyagi25 DistilBertForSequenceClassification from sumittyagi25 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_sumittyagi25 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_sumittyagi25` is a English model originally trained by sumittyagi25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_sumittyagi25_en_5.5.0_3.0_1726630816702.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_sumittyagi25_en_5.5.0_3.0_1726630816702.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_sumittyagi25","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_sumittyagi25", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_sumittyagi25| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sumittyagi25/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-gal_ner_iwcg_5_en.md b/docs/_posts/ahmedlone127/2024-09-18-gal_ner_iwcg_5_en.md new file mode 100644 index 00000000000000..24cb356c310abf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-gal_ner_iwcg_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English gal_ner_iwcg_5 XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: gal_ner_iwcg_5 +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_ner_iwcg_5` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_ner_iwcg_5_en_5.5.0_3.0_1726701802473.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_ner_iwcg_5_en_5.5.0_3.0_1726701802473.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_ner_iwcg_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_ner_iwcg_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_ner_iwcg_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|426.5 MB| + +## References + +https://huggingface.co/homersimpson/gal-ner-iwcg-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline_en.md new file mode 100644 index 00000000000000..3bce6526e1b69e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline pipeline BertForSequenceClassification from tanoManzo +author: John Snow Labs +name: gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline` is a English model originally trained by tanoManzo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline_en_5.5.0_3.0_1726647930756.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline_en_5.5.0_3.0_1726647930756.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gena_lm_bert_base_t2t_ft_hepg2_1kbphg19_dhss_h3k27ac_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.0 MB| + +## References + +https://huggingface.co/tanoManzo/gena-lm-bert-base-t2t_ft_Hepg2_1kbpHG19_DHSs_H3K27AC + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-intel_huggingface_workshop_bot_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-intel_huggingface_workshop_bot_pipeline_en.md new file mode 100644 index 00000000000000..ab083ff1717595 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-intel_huggingface_workshop_bot_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English intel_huggingface_workshop_bot_pipeline pipeline DistilBertForSequenceClassification from arjunraghunandanan +author: John Snow Labs +name: intel_huggingface_workshop_bot_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`intel_huggingface_workshop_bot_pipeline` is a English model originally trained by arjunraghunandanan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/intel_huggingface_workshop_bot_pipeline_en_5.5.0_3.0_1726694996331.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/intel_huggingface_workshop_bot_pipeline_en_5.5.0_3.0_1726694996331.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("intel_huggingface_workshop_bot_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("intel_huggingface_workshop_bot_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|intel_huggingface_workshop_bot_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/arjunraghunandanan/intel-huggingface-workshop-bot + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-mach_3_en.md b/docs/_posts/ahmedlone127/2024-09-18-mach_3_en.md new file mode 100644 index 00000000000000..39829232db9b99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-mach_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mach_3 RoBertaForSequenceClassification from BaronSch +author: John Snow Labs +name: mach_3 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mach_3` is a English model originally trained by BaronSch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mach_3_en_5.5.0_3.0_1726621973653.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mach_3_en_5.5.0_3.0_1726621973653.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("mach_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("mach_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mach_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/BaronSch/Mach_3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-mongolian_roberta_large_en.md b/docs/_posts/ahmedlone127/2024-09-18-mongolian_roberta_large_en.md new file mode 100644 index 00000000000000..6ab3f22f00dacd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-mongolian_roberta_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mongolian_roberta_large RoBertaEmbeddings from bayartsogt +author: John Snow Labs +name: mongolian_roberta_large +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mongolian_roberta_large` is a English model originally trained by bayartsogt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mongolian_roberta_large_en_5.5.0_3.0_1726651443337.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mongolian_roberta_large_en_5.5.0_3.0_1726651443337.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("mongolian_roberta_large","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("mongolian_roberta_large","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mongolian_roberta_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/bayartsogt/mongolian-roberta-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-moviegenreprediction_en.md b/docs/_posts/ahmedlone127/2024-09-18-moviegenreprediction_en.md new file mode 100644 index 00000000000000..22f63c1c7c8d64 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-moviegenreprediction_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English moviegenreprediction DistilBertForSequenceClassification from shaggysus +author: John Snow Labs +name: moviegenreprediction +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`moviegenreprediction` is a English model originally trained by shaggysus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/moviegenreprediction_en_5.5.0_3.0_1726625728202.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/moviegenreprediction_en_5.5.0_3.0_1726625728202.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("moviegenreprediction","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("moviegenreprediction", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|moviegenreprediction| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/shaggysus/MovieGenrePrediction \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-multilingual_xlm_roberta_for_ner_sedaorcin_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-18-multilingual_xlm_roberta_for_ner_sedaorcin_pipeline_xx.md new file mode 100644 index 00000000000000..da7c55d898ff0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-multilingual_xlm_roberta_for_ner_sedaorcin_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual multilingual_xlm_roberta_for_ner_sedaorcin_pipeline pipeline XlmRoBertaForTokenClassification from sedaorcin +author: John Snow Labs +name: multilingual_xlm_roberta_for_ner_sedaorcin_pipeline +date: 2024-09-18 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multilingual_xlm_roberta_for_ner_sedaorcin_pipeline` is a Multilingual model originally trained by sedaorcin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multilingual_xlm_roberta_for_ner_sedaorcin_pipeline_xx_5.5.0_3.0_1726636499064.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multilingual_xlm_roberta_for_ner_sedaorcin_pipeline_xx_5.5.0_3.0_1726636499064.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("multilingual_xlm_roberta_for_ner_sedaorcin_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("multilingual_xlm_roberta_for_ner_sedaorcin_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multilingual_xlm_roberta_for_ner_sedaorcin_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|853.8 MB| + +## References + +https://huggingface.co/sedaorcin/multilingual-xlm-roberta-for-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-mymodel_isom5240group17_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-mymodel_isom5240group17_pipeline_en.md new file mode 100644 index 00000000000000..18cd54bc0cb02a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-mymodel_isom5240group17_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mymodel_isom5240group17_pipeline pipeline RoBertaForSequenceClassification from isom5240group17 +author: John Snow Labs +name: mymodel_isom5240group17_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mymodel_isom5240group17_pipeline` is a English model originally trained by isom5240group17. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mymodel_isom5240group17_pipeline_en_5.5.0_3.0_1726621539817.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mymodel_isom5240group17_pipeline_en_5.5.0_3.0_1726621539817.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mymodel_isom5240group17_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mymodel_isom5240group17_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mymodel_isom5240group17_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.8 MB| + +## References + +https://huggingface.co/isom5240group17/myModel + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-nerd_nerd_random0_seed1_twitter_roberta_base_jun2021_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-nerd_nerd_random0_seed1_twitter_roberta_base_jun2021_pipeline_en.md new file mode 100644 index 00000000000000..7c63c0b199f38b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-nerd_nerd_random0_seed1_twitter_roberta_base_jun2021_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nerd_nerd_random0_seed1_twitter_roberta_base_jun2021_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: nerd_nerd_random0_seed1_twitter_roberta_base_jun2021_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerd_nerd_random0_seed1_twitter_roberta_base_jun2021_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerd_nerd_random0_seed1_twitter_roberta_base_jun2021_pipeline_en_5.5.0_3.0_1726650408758.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerd_nerd_random0_seed1_twitter_roberta_base_jun2021_pipeline_en_5.5.0_3.0_1726650408758.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nerd_nerd_random0_seed1_twitter_roberta_base_jun2021_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nerd_nerd_random0_seed1_twitter_roberta_base_jun2021_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerd_nerd_random0_seed1_twitter_roberta_base_jun2021_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/nerd-nerd_random0_seed1-twitter-roberta-base-jun2021 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-ooc_patch_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-ooc_patch_v1_pipeline_en.md new file mode 100644 index 00000000000000..f73b773b9e09ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-ooc_patch_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ooc_patch_v1_pipeline pipeline DistilBertForSequenceClassification from Ksgk-fy +author: John Snow Labs +name: ooc_patch_v1_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ooc_patch_v1_pipeline` is a English model originally trained by Ksgk-fy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ooc_patch_v1_pipeline_en_5.5.0_3.0_1726630788562.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ooc_patch_v1_pipeline_en_5.5.0_3.0_1726630788562.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ooc_patch_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ooc_patch_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ooc_patch_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Ksgk-fy/ooc_patch_v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-opticalbert_qa_squadv1_cased_en.md b/docs/_posts/ahmedlone127/2024-09-18-opticalbert_qa_squadv1_cased_en.md new file mode 100644 index 00000000000000..a011795596225b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-opticalbert_qa_squadv1_cased_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English opticalbert_qa_squadv1_cased BertForQuestionAnswering from opticalmaterials +author: John Snow Labs +name: opticalbert_qa_squadv1_cased +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opticalbert_qa_squadv1_cased` is a English model originally trained by opticalmaterials. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opticalbert_qa_squadv1_cased_en_5.5.0_3.0_1726659146866.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opticalbert_qa_squadv1_cased_en_5.5.0_3.0_1726659146866.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("opticalbert_qa_squadv1_cased","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("opticalbert_qa_squadv1_cased", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opticalbert_qa_squadv1_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.5 MB| + +## References + +https://huggingface.co/opticalmaterials/opticalbert_qa_squadv1_cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-org_bot_classifier_dstilbert_en.md b/docs/_posts/ahmedlone127/2024-09-18-org_bot_classifier_dstilbert_en.md new file mode 100644 index 00000000000000..e00b680a057327 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-org_bot_classifier_dstilbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English org_bot_classifier_dstilbert DistilBertForSequenceClassification from Jjzzzz +author: John Snow Labs +name: org_bot_classifier_dstilbert +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`org_bot_classifier_dstilbert` is a English model originally trained by Jjzzzz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/org_bot_classifier_dstilbert_en_5.5.0_3.0_1726625547651.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/org_bot_classifier_dstilbert_en_5.5.0_3.0_1726625547651.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("org_bot_classifier_dstilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("org_bot_classifier_dstilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|org_bot_classifier_dstilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Jjzzzz/org_bot_classifier_dstilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-platzi_distilroberta_base_mrpc_glue_carlos_venegas_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-platzi_distilroberta_base_mrpc_glue_carlos_venegas_pipeline_en.md new file mode 100644 index 00000000000000..6b57d95fddb335 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-platzi_distilroberta_base_mrpc_glue_carlos_venegas_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English platzi_distilroberta_base_mrpc_glue_carlos_venegas_pipeline pipeline RoBertaForSequenceClassification from platzi +author: John Snow Labs +name: platzi_distilroberta_base_mrpc_glue_carlos_venegas_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_mrpc_glue_carlos_venegas_pipeline` is a English model originally trained by platzi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_carlos_venegas_pipeline_en_5.5.0_3.0_1726649403634.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_carlos_venegas_pipeline_en_5.5.0_3.0_1726649403634.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("platzi_distilroberta_base_mrpc_glue_carlos_venegas_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("platzi_distilroberta_base_mrpc_glue_carlos_venegas_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_mrpc_glue_carlos_venegas_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/platzi/platzi-distilroberta-base-mrpc-glue-carlos-venegas + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-platzi_distilroberta_base_mrpc_glue_saul_burgos_en.md b/docs/_posts/ahmedlone127/2024-09-18-platzi_distilroberta_base_mrpc_glue_saul_burgos_en.md new file mode 100644 index 00000000000000..19092b37d0e965 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-platzi_distilroberta_base_mrpc_glue_saul_burgos_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English platzi_distilroberta_base_mrpc_glue_saul_burgos RoBertaForSequenceClassification from platzi +author: John Snow Labs +name: platzi_distilroberta_base_mrpc_glue_saul_burgos +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_mrpc_glue_saul_burgos` is a English model originally trained by platzi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_saul_burgos_en_5.5.0_3.0_1726666439544.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_saul_burgos_en_5.5.0_3.0_1726666439544.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_saul_burgos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_saul_burgos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_mrpc_glue_saul_burgos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/platzi/platzi-distilroberta-base-mrpc-glue-saul-burgos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-ptcrawl_plus_legal_large_v1_6__checkpoint_last_en.md b/docs/_posts/ahmedlone127/2024-09-18-ptcrawl_plus_legal_large_v1_6__checkpoint_last_en.md new file mode 100644 index 00000000000000..d498c9be8b211e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-ptcrawl_plus_legal_large_v1_6__checkpoint_last_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ptcrawl_plus_legal_large_v1_6__checkpoint_last RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: ptcrawl_plus_legal_large_v1_6__checkpoint_last +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ptcrawl_plus_legal_large_v1_6__checkpoint_last` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ptcrawl_plus_legal_large_v1_6__checkpoint_last_en_5.5.0_3.0_1726678638354.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ptcrawl_plus_legal_large_v1_6__checkpoint_last_en_5.5.0_3.0_1726678638354.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("ptcrawl_plus_legal_large_v1_6__checkpoint_last","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("ptcrawl_plus_legal_large_v1_6__checkpoint_last","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ptcrawl_plus_legal_large_v1_6__checkpoint_last| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|843.9 MB| + +## References + +https://huggingface.co/eduagarcia-temp/ptcrawl_plus_legal_large_v1_6__checkpoint_last \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_cased_portuguese_c_corpus_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_cased_portuguese_c_corpus_en.md new file mode 100644 index 00000000000000..4f0444a249b71a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_cased_portuguese_c_corpus_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_cased_portuguese_c_corpus RoBertaEmbeddings from rosimeirecosta +author: John Snow Labs +name: roberta_base_cased_portuguese_c_corpus +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_cased_portuguese_c_corpus` is a English model originally trained by rosimeirecosta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_cased_portuguese_c_corpus_en_5.5.0_3.0_1726618050564.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_cased_portuguese_c_corpus_en_5.5.0_3.0_1726618050564.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_cased_portuguese_c_corpus","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_cased_portuguese_c_corpus","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_cased_portuguese_c_corpus| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|590.6 MB| + +## References + +https://huggingface.co/rosimeirecosta/roberta-base-cased-pt-c-corpus \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_epoch_70_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_epoch_70_pipeline_en.md new file mode 100644 index 00000000000000..b9590162958565 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_epoch_70_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_70_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_70_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_70_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_70_pipeline_en_5.5.0_3.0_1726678343365.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_70_pipeline_en_5.5.0_3.0_1726678343365.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_70_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_70_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_70_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_70 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_finetuned_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_finetuned_1_pipeline_en.md new file mode 100644 index 00000000000000..7cc8698b7e798c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_finetuned_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_1_pipeline pipeline RoBertaForSequenceClassification from sara-nabhani +author: John Snow Labs +name: roberta_base_finetuned_1_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_1_pipeline` is a English model originally trained by sara-nabhani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_1_pipeline_en_5.5.0_3.0_1726622284435.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_1_pipeline_en_5.5.0_3.0_1726622284435.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|415.3 MB| + +## References + +https://huggingface.co/sara-nabhani/roberta-base-finetuned-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_finetuned_bible_accelerate_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_finetuned_bible_accelerate_pipeline_en.md new file mode 100644 index 00000000000000..f7fabf8d12820a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_finetuned_bible_accelerate_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_bible_accelerate_pipeline pipeline RoBertaEmbeddings from Pragash-Mohanarajah +author: John Snow Labs +name: roberta_base_finetuned_bible_accelerate_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_bible_accelerate_pipeline` is a English model originally trained by Pragash-Mohanarajah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_bible_accelerate_pipeline_en_5.5.0_3.0_1726651396259.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_bible_accelerate_pipeline_en_5.5.0_3.0_1726651396259.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_bible_accelerate_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_bible_accelerate_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_bible_accelerate_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/Pragash-Mohanarajah/roberta-base-finetuned-bible-accelerate + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_finetuned_college_reviews_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_finetuned_college_reviews_en.md new file mode 100644 index 00000000000000..795322f0ee2ab8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_finetuned_college_reviews_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_college_reviews RoBertaEmbeddings from Mohit09gupta +author: John Snow Labs +name: roberta_base_finetuned_college_reviews +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_college_reviews` is a English model originally trained by Mohit09gupta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_college_reviews_en_5.5.0_3.0_1726678451807.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_college_reviews_en_5.5.0_3.0_1726678451807.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_college_reviews","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_college_reviews","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_college_reviews| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|467.1 MB| + +## References + +https://huggingface.co/Mohit09gupta/roberta-base-finetuned-College-Reviews \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_imdb_aypan17_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_imdb_aypan17_pipeline_en.md new file mode 100644 index 00000000000000..889d8b5b473e95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_imdb_aypan17_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_imdb_aypan17_pipeline pipeline RoBertaForSequenceClassification from aypan17 +author: John Snow Labs +name: roberta_base_imdb_aypan17_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_imdb_aypan17_pipeline` is a English model originally trained by aypan17. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_imdb_aypan17_pipeline_en_5.5.0_3.0_1726690469274.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_imdb_aypan17_pipeline_en_5.5.0_3.0_1726690469274.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_imdb_aypan17_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_imdb_aypan17_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_imdb_aypan17_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.0 MB| + +## References + +https://huggingface.co/aypan17/roberta-base-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_fine_tuned_text_classification_slovene_data_augmentation_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_fine_tuned_text_classification_slovene_data_augmentation_en.md new file mode 100644 index 00000000000000..85e864c036929b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_fine_tuned_text_classification_slovene_data_augmentation_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_fine_tuned_text_classification_slovene_data_augmentation RoBertaForSequenceClassification from Sleoruiz +author: John Snow Labs +name: roberta_fine_tuned_text_classification_slovene_data_augmentation +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_fine_tuned_text_classification_slovene_data_augmentation` is a English model originally trained by Sleoruiz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_fine_tuned_text_classification_slovene_data_augmentation_en_5.5.0_3.0_1726649890818.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_fine_tuned_text_classification_slovene_data_augmentation_en_5.5.0_3.0_1726649890818.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_fine_tuned_text_classification_slovene_data_augmentation","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_fine_tuned_text_classification_slovene_data_augmentation", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_fine_tuned_text_classification_slovene_data_augmentation| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|461.3 MB| + +## References + +https://huggingface.co/Sleoruiz/roberta-fine-tuned-text-classification-SL-data-augmentation \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_for_pii_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_for_pii_en.md new file mode 100644 index 00000000000000..9ca8441bdb97b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_for_pii_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_for_pii RoBertaForTokenClassification from moo3030 +author: John Snow Labs +name: roberta_for_pii +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_for_pii` is a English model originally trained by moo3030. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_for_pii_en_5.5.0_3.0_1726652660748.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_for_pii_en_5.5.0_3.0_1726652660748.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_for_pii","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_for_pii", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_for_pii| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|438.5 MB| + +## References + +https://huggingface.co/moo3030/roberta-for-pii \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_large_finetuned_ner_single_label_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_large_finetuned_ner_single_label_pipeline_en.md new file mode 100644 index 00000000000000..9a8a24a2a679ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_large_finetuned_ner_single_label_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_finetuned_ner_single_label_pipeline pipeline RoBertaForTokenClassification from DDDacc +author: John Snow Labs +name: roberta_large_finetuned_ner_single_label_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_ner_single_label_pipeline` is a English model originally trained by DDDacc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_ner_single_label_pipeline_en_5.5.0_3.0_1726638382672.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_ner_single_label_pipeline_en_5.5.0_3.0_1726638382672.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_finetuned_ner_single_label_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_finetuned_ner_single_label_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_ner_single_label_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/DDDacc/RoBERTa-Large-finetuned-ner-single-label + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_nli_group71_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_nli_group71_pipeline_en.md new file mode 100644 index 00000000000000..e2dd68e06a3f3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_nli_group71_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_nli_group71_pipeline pipeline RoBertaForSequenceClassification from awashh +author: John Snow Labs +name: roberta_nli_group71_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_nli_group71_pipeline` is a English model originally trained by awashh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_nli_group71_pipeline_en_5.5.0_3.0_1726650290207.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_nli_group71_pipeline_en_5.5.0_3.0_1726650290207.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_nli_group71_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_nli_group71_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_nli_group71_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|456.1 MB| + +## References + +https://huggingface.co/awashh/RoBERTa-NLI-Group71 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_poetry_nature_crpo_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_poetry_nature_crpo_en.md new file mode 100644 index 00000000000000..7573c07c70d6ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_poetry_nature_crpo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_poetry_nature_crpo RoBertaEmbeddings from andreipb +author: John Snow Labs +name: roberta_poetry_nature_crpo +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_poetry_nature_crpo` is a English model originally trained by andreipb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_poetry_nature_crpo_en_5.5.0_3.0_1726678322484.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_poetry_nature_crpo_en_5.5.0_3.0_1726678322484.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_poetry_nature_crpo","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_poetry_nature_crpo","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_poetry_nature_crpo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/andreipb/roberta-poetry-nature-crpo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_retrained_350k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_retrained_350k_pipeline_en.md new file mode 100644 index 00000000000000..b74ed14e53d940 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_retrained_350k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_retrained_350k_pipeline pipeline RoBertaEmbeddings from bitsanlp +author: John Snow Labs +name: roberta_retrained_350k_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_retrained_350k_pipeline` is a English model originally trained by bitsanlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_retrained_350k_pipeline_en_5.5.0_3.0_1726678438848.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_retrained_350k_pipeline_en_5.5.0_3.0_1726678438848.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_retrained_350k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_retrained_350k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_retrained_350k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.8 MB| + +## References + +https://huggingface.co/bitsanlp/roberta-retrained-350k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_spanish_clinical_trials_medic_attr_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_spanish_clinical_trials_medic_attr_ner_pipeline_en.md new file mode 100644 index 00000000000000..fb658109f40067 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_spanish_clinical_trials_medic_attr_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_spanish_clinical_trials_medic_attr_ner_pipeline pipeline RoBertaForTokenClassification from medspaner +author: John Snow Labs +name: roberta_spanish_clinical_trials_medic_attr_ner_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_spanish_clinical_trials_medic_attr_ner_pipeline` is a English model originally trained by medspaner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_spanish_clinical_trials_medic_attr_ner_pipeline_en_5.5.0_3.0_1726653453128.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_spanish_clinical_trials_medic_attr_ner_pipeline_en_5.5.0_3.0_1726653453128.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_spanish_clinical_trials_medic_attr_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_spanish_clinical_trials_medic_attr_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_spanish_clinical_trials_medic_attr_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|442.3 MB| + +## References + +https://huggingface.co/medspaner/roberta-es-clinical-trials-medic-attr-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-scenario_tcr_data_cl_cardiff_cl_only20271_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-scenario_tcr_data_cl_cardiff_cl_only20271_pipeline_en.md new file mode 100644 index 00000000000000..719965bde72af9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-scenario_tcr_data_cl_cardiff_cl_only20271_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English scenario_tcr_data_cl_cardiff_cl_only20271_pipeline pipeline XlmRoBertaForSequenceClassification from haryoaw +author: John Snow Labs +name: scenario_tcr_data_cl_cardiff_cl_only20271_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scenario_tcr_data_cl_cardiff_cl_only20271_pipeline` is a English model originally trained by haryoaw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scenario_tcr_data_cl_cardiff_cl_only20271_pipeline_en_5.5.0_3.0_1726697159548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scenario_tcr_data_cl_cardiff_cl_only20271_pipeline_en_5.5.0_3.0_1726697159548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("scenario_tcr_data_cl_cardiff_cl_only20271_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("scenario_tcr_data_cl_cardiff_cl_only20271_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scenario_tcr_data_cl_cardiff_cl_only20271_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.6 MB| + +## References + +https://huggingface.co/haryoaw/scenario-TCR_data-cl-cardiff_cl_only20271 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_uncased_issues_128_martinwunderlich_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_uncased_issues_128_martinwunderlich_en.md new file mode 100644 index 00000000000000..309874f631de29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_base_uncased_issues_128_martinwunderlich_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_martinwunderlich BertSentenceEmbeddings from martinwunderlich +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_martinwunderlich +date: 2024-09-18 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_martinwunderlich` is a English model originally trained by martinwunderlich. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_martinwunderlich_en_5.5.0_3.0_1726694214492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_martinwunderlich_en_5.5.0_3.0_1726694214492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_martinwunderlich","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_martinwunderlich","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_martinwunderlich| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/martinwunderlich/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_convbert_base_generator_finnish_fi.md b/docs/_posts/ahmedlone127/2024-09-18-sent_convbert_base_generator_finnish_fi.md new file mode 100644 index 00000000000000..e4b304471910a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_convbert_base_generator_finnish_fi.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Finnish sent_convbert_base_generator_finnish BertSentenceEmbeddings from Finnish-NLP +author: John Snow Labs +name: sent_convbert_base_generator_finnish +date: 2024-09-18 +tags: [fi, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: fi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_convbert_base_generator_finnish` is a Finnish model originally trained by Finnish-NLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_convbert_base_generator_finnish_fi_5.5.0_3.0_1726676163191.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_convbert_base_generator_finnish_fi_5.5.0_3.0_1726676163191.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_convbert_base_generator_finnish","fi") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_convbert_base_generator_finnish","fi") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_convbert_base_generator_finnish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|fi| +|Size:|181.3 MB| + +## References + +https://huggingface.co/Finnish-NLP/convbert-base-generator-finnish \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_greeksocialbert_base_greek_uncased_v1_el.md b/docs/_posts/ahmedlone127/2024-09-18-sent_greeksocialbert_base_greek_uncased_v1_el.md new file mode 100644 index 00000000000000..e53d167c294d2e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_greeksocialbert_base_greek_uncased_v1_el.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Modern Greek (1453-) sent_greeksocialbert_base_greek_uncased_v1 BertSentenceEmbeddings from gealexandri +author: John Snow Labs +name: sent_greeksocialbert_base_greek_uncased_v1 +date: 2024-09-18 +tags: [el, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: el +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_greeksocialbert_base_greek_uncased_v1` is a Modern Greek (1453-) model originally trained by gealexandri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_greeksocialbert_base_greek_uncased_v1_el_5.5.0_3.0_1726694204097.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_greeksocialbert_base_greek_uncased_v1_el_5.5.0_3.0_1726694204097.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_greeksocialbert_base_greek_uncased_v1","el") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_greeksocialbert_base_greek_uncased_v1","el") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_greeksocialbert_base_greek_uncased_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|el| +|Size:|421.3 MB| + +## References + +https://huggingface.co/gealexandri/greeksocialbert-base-greek-uncased-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sentiment_ufakz_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sentiment_ufakz_pipeline_en.md new file mode 100644 index 00000000000000..80d4a3ebca0597 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sentiment_ufakz_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_ufakz_pipeline pipeline DistilBertForSequenceClassification from ufakz +author: John Snow Labs +name: sentiment_ufakz_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_ufakz_pipeline` is a English model originally trained by ufakz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_ufakz_pipeline_en_5.5.0_3.0_1726676766688.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_ufakz_pipeline_en_5.5.0_3.0_1726676766688.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_ufakz_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_ufakz_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_ufakz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ufakz/sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-stego_classifier_checkpoint_epoch_0_2024_07_26_14_26_52_en.md b/docs/_posts/ahmedlone127/2024-09-18-stego_classifier_checkpoint_epoch_0_2024_07_26_14_26_52_en.md new file mode 100644 index 00000000000000..48b203c72fe202 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-stego_classifier_checkpoint_epoch_0_2024_07_26_14_26_52_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_0_2024_07_26_14_26_52 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_0_2024_07_26_14_26_52 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_0_2024_07_26_14_26_52` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_0_2024_07_26_14_26_52_en_5.5.0_3.0_1726625717043.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_0_2024_07_26_14_26_52_en_5.5.0_3.0_1726625717043.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_0_2024_07_26_14_26_52","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_0_2024_07_26_14_26_52", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_0_2024_07_26_14_26_52| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-0-2024-07-26_14-26-52 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-t_8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-t_8_pipeline_en.md new file mode 100644 index 00000000000000..a8023768400444 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-t_8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English t_8_pipeline pipeline RoBertaForSequenceClassification from Pablojmed +author: John Snow Labs +name: t_8_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t_8_pipeline` is a English model originally trained by Pablojmed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t_8_pipeline_en_5.5.0_3.0_1726642365902.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t_8_pipeline_en_5.5.0_3.0_1726642365902.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("t_8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("t_8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t_8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.2 MB| + +## References + +https://huggingface.co/Pablojmed/t_8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-test_model_cynthiaaaaaaaa_en.md b/docs/_posts/ahmedlone127/2024-09-18-test_model_cynthiaaaaaaaa_en.md new file mode 100644 index 00000000000000..923f93f96eb9f3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-test_model_cynthiaaaaaaaa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_model_cynthiaaaaaaaa DistilBertForSequenceClassification from Cynthiaaaaaaaa +author: John Snow Labs +name: test_model_cynthiaaaaaaaa +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_model_cynthiaaaaaaaa` is a English model originally trained by Cynthiaaaaaaaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_model_cynthiaaaaaaaa_en_5.5.0_3.0_1726669790814.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_model_cynthiaaaaaaaa_en_5.5.0_3.0_1726669790814.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_model_cynthiaaaaaaaa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_model_cynthiaaaaaaaa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_model_cynthiaaaaaaaa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Cynthiaaaaaaaa/test_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-tibetan_roberta_g_v1_867238_en.md b/docs/_posts/ahmedlone127/2024-09-18-tibetan_roberta_g_v1_867238_en.md new file mode 100644 index 00000000000000..e9214fb9b5adf1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-tibetan_roberta_g_v1_867238_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tibetan_roberta_g_v1_867238 RoBertaEmbeddings from spsither +author: John Snow Labs +name: tibetan_roberta_g_v1_867238 +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tibetan_roberta_g_v1_867238` is a English model originally trained by spsither. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tibetan_roberta_g_v1_867238_en_5.5.0_3.0_1726626677602.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tibetan_roberta_g_v1_867238_en_5.5.0_3.0_1726626677602.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("tibetan_roberta_g_v1_867238","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("tibetan_roberta_g_v1_867238","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tibetan_roberta_g_v1_867238| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|412.1 MB| + +## References + +https://huggingface.co/spsither/tibetan_RoBERTa_G_v1_867238 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-topic_topic_random3_seed2_bernice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-topic_topic_random3_seed2_bernice_pipeline_en.md new file mode 100644 index 00000000000000..1a6a88fe9b0d68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-topic_topic_random3_seed2_bernice_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English topic_topic_random3_seed2_bernice_pipeline pipeline XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_random3_seed2_bernice_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_random3_seed2_bernice_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_random3_seed2_bernice_pipeline_en_5.5.0_3.0_1726672715966.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_random3_seed2_bernice_pipeline_en_5.5.0_3.0_1726672715966.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("topic_topic_random3_seed2_bernice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("topic_topic_random3_seed2_bernice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_random3_seed2_bernice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|805.6 MB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_random3_seed2-bernice + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-toxicdistilbert_en.md b/docs/_posts/ahmedlone127/2024-09-18-toxicdistilbert_en.md new file mode 100644 index 00000000000000..eb2d056083cd72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-toxicdistilbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English toxicdistilbert DistilBertForSequenceClassification from YuryCHep +author: John Snow Labs +name: toxicdistilbert +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`toxicdistilbert` is a English model originally trained by YuryCHep. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/toxicdistilbert_en_5.5.0_3.0_1726695273249.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/toxicdistilbert_en_5.5.0_3.0_1726695273249.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("toxicdistilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("toxicdistilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|toxicdistilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/YuryCHep/TOXICDISTILBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-twitterfin_padding80model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-twitterfin_padding80model_pipeline_en.md new file mode 100644 index 00000000000000..5e72048700b0ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-twitterfin_padding80model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitterfin_padding80model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: twitterfin_padding80model_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitterfin_padding80model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitterfin_padding80model_pipeline_en_5.5.0_3.0_1726625369293.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitterfin_padding80model_pipeline_en_5.5.0_3.0_1726625369293.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitterfin_padding80model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitterfin_padding80model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitterfin_padding80model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/twitterfin_padding80model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_clickbait_mattzid_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_clickbait_mattzid_pipeline_en.md new file mode 100644 index 00000000000000..90293d8477d8e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_clickbait_mattzid_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_clickbait_mattzid_pipeline pipeline XlmRoBertaForSequenceClassification from MattZid +author: John Snow Labs +name: xlm_roberta_base_clickbait_mattzid_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_clickbait_mattzid_pipeline` is a English model originally trained by MattZid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_clickbait_mattzid_pipeline_en_5.5.0_3.0_1726685513238.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_clickbait_mattzid_pipeline_en_5.5.0_3.0_1726685513238.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_clickbait_mattzid_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_clickbait_mattzid_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_clickbait_mattzid_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|856.5 MB| + +## References + +https://huggingface.co/MattZid/xlm-roberta-base-clickbait + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_language_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_language_pipeline_en.md new file mode 100644 index 00000000000000..a675fd63411335 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_language_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_language_pipeline pipeline XlmRoBertaForSequenceClassification from ms25 +author: John Snow Labs +name: xlm_roberta_base_finetuned_language_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_language_pipeline` is a English model originally trained by ms25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_language_pipeline_en_5.5.0_3.0_1726672675542.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_language_pipeline_en_5.5.0_3.0_1726672675542.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_language_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_language_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_language_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|904.3 MB| + +## References + +https://huggingface.co/ms25/xlm-roberta-base-finetuned-language + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_drigb_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_drigb_en.md new file mode 100644 index 00000000000000..d529783c464f21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_drigb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_drigb XlmRoBertaForTokenClassification from drigb +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_drigb +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_drigb` is a English model originally trained by drigb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_drigb_en_5.5.0_3.0_1726657436243.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_drigb_en_5.5.0_3.0_1726657436243.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_drigb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_drigb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_drigb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/drigb/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_mealduct_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_mealduct_pipeline_en.md new file mode 100644 index 00000000000000..4e6b4cd0a5569f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_mealduct_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_mealduct_pipeline pipeline XlmRoBertaForTokenClassification from MealDuct +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_mealduct_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_mealduct_pipeline` is a English model originally trained by MealDuct. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_mealduct_pipeline_en_5.5.0_3.0_1726656693508.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_mealduct_pipeline_en_5.5.0_3.0_1726656693508.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_mealduct_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_mealduct_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_mealduct_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/MealDuct/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_reinoudbosch_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_reinoudbosch_pipeline_en.md new file mode 100644 index 00000000000000..758dff0eae4487 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_french_reinoudbosch_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_reinoudbosch_pipeline pipeline XlmRoBertaForTokenClassification from reinoudbosch +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_reinoudbosch_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_reinoudbosch_pipeline` is a English model originally trained by reinoudbosch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_reinoudbosch_pipeline_en_5.5.0_3.0_1726635646117.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_reinoudbosch_pipeline_en_5.5.0_3.0_1726635646117.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_reinoudbosch_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_reinoudbosch_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_reinoudbosch_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/reinoudbosch/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_french_abdelkareem_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_french_abdelkareem_pipeline_en.md new file mode 100644 index 00000000000000..993518f9be3df2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_french_abdelkareem_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_abdelkareem_pipeline pipeline XlmRoBertaForTokenClassification from Abdelkareem +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_abdelkareem_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_abdelkareem_pipeline` is a English model originally trained by Abdelkareem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_abdelkareem_pipeline_en_5.5.0_3.0_1726663773276.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_abdelkareem_pipeline_en_5.5.0_3.0_1726663773276.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_abdelkareem_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_abdelkareem_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_abdelkareem_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|838.4 MB| + +## References + +https://huggingface.co/Abdelkareem/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_french_conorjudge_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_french_conorjudge_en.md new file mode 100644 index 00000000000000..329f5235b1a568 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_french_conorjudge_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_conorjudge XlmRoBertaForTokenClassification from conorjudge +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_conorjudge +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_conorjudge` is a English model originally trained by conorjudge. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_conorjudge_en_5.5.0_3.0_1726657086323.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_conorjudge_en_5.5.0_3.0_1726657086323.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_conorjudge","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_conorjudge", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_conorjudge| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/conorjudge/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_italian_jbreunig_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_italian_jbreunig_en.md new file mode 100644 index 00000000000000..de01bae6ce3d67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_italian_jbreunig_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_jbreunig XlmRoBertaForTokenClassification from jbreunig +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_jbreunig +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_jbreunig` is a English model originally trained by jbreunig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_jbreunig_en_5.5.0_3.0_1726656233749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_jbreunig_en_5.5.0_3.0_1726656233749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_jbreunig","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_jbreunig", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_jbreunig| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/jbreunig/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_italian_jjglilleberg_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_italian_jjglilleberg_pipeline_en.md new file mode 100644 index 00000000000000..fb6fa458976c0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_italian_jjglilleberg_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_jjglilleberg_pipeline pipeline XlmRoBertaForTokenClassification from jjglilleberg +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_jjglilleberg_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_jjglilleberg_pipeline` is a English model originally trained by jjglilleberg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_jjglilleberg_pipeline_en_5.5.0_3.0_1726701846152.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_jjglilleberg_pipeline_en_5.5.0_3.0_1726701846152.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_jjglilleberg_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_jjglilleberg_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_jjglilleberg_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|816.8 MB| + +## References + +https://huggingface.co/jjglilleberg/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_nepal_bhasa_vietnam_train_1_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_nepal_bhasa_vietnam_train_1_en.md new file mode 100644 index 00000000000000..a0a040a9443863 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_nepal_bhasa_vietnam_train_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_nepal_bhasa_vietnam_train_1 XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_nepal_bhasa_vietnam_train_1 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_nepal_bhasa_vietnam_train_1` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_nepal_bhasa_vietnam_train_1_en_5.5.0_3.0_1726660349372.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_nepal_bhasa_vietnam_train_1_en_5.5.0_3.0_1726660349372.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_nepal_bhasa_vietnam_train_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_nepal_bhasa_vietnam_train_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_nepal_bhasa_vietnam_train_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|794.3 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-New_VietNam-train-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_trimmed_english_10000_xnli_english_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_trimmed_english_10000_xnli_english_en.md new file mode 100644 index 00000000000000..9da22216b5fbca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_trimmed_english_10000_xnli_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_english_10000_xnli_english XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_english_10000_xnli_english +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_english_10000_xnli_english` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_10000_xnli_english_en_5.5.0_3.0_1726671560733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_10000_xnli_english_en_5.5.0_3.0_1726671560733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_english_10000_xnli_english","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_english_10000_xnli_english", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_english_10000_xnli_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|353.6 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-en-10000-xnli-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_en.md new file mode 100644 index 00000000000000..8022de68702a66 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_xnli_spanish_trimmed_spanish_15000 XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_xnli_spanish_trimmed_spanish_15000 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_xnli_spanish_trimmed_spanish_15000` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_en_5.5.0_3.0_1726672372049.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_en_5.5.0_3.0_1726672372049.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_xnli_spanish_trimmed_spanish_15000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_xnli_spanish_trimmed_spanish_15000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_xnli_spanish_trimmed_spanish_15000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|367.4 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-xnli-es-trimmed-es-15000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_pipeline_en.md new file mode 100644 index 00000000000000..862d97abbafe7d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_pipeline_en_5.5.0_3.0_1726672390020.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_pipeline_en_5.5.0_3.0_1726672390020.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_xnli_spanish_trimmed_spanish_15000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|367.5 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-xnli-es-trimmed-es-15000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlmr_model_name_finetuned_panx_german_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlmr_model_name_finetuned_panx_german_en.md new file mode 100644 index 00000000000000..32c7e6ed684947 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlmr_model_name_finetuned_panx_german_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_model_name_finetuned_panx_german XlmRoBertaForTokenClassification from Denilah +author: John Snow Labs +name: xlmr_model_name_finetuned_panx_german +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_model_name_finetuned_panx_german` is a English model originally trained by Denilah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_model_name_finetuned_panx_german_en_5.5.0_3.0_1726701746949.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_model_name_finetuned_panx_german_en_5.5.0_3.0_1726701746949.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmr_model_name_finetuned_panx_german","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmr_model_name_finetuned_panx_german", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_model_name_finetuned_panx_german| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.7 MB| + +## References + +https://huggingface.co/Denilah/xlmr_model_name-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-yelp_polarity_roberta_base_seed_3_en.md b/docs/_posts/ahmedlone127/2024-09-18-yelp_polarity_roberta_base_seed_3_en.md new file mode 100644 index 00000000000000..d181543be0e96b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-yelp_polarity_roberta_base_seed_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English yelp_polarity_roberta_base_seed_3 RoBertaForSequenceClassification from utahnlp +author: John Snow Labs +name: yelp_polarity_roberta_base_seed_3 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`yelp_polarity_roberta_base_seed_3` is a English model originally trained by utahnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/yelp_polarity_roberta_base_seed_3_en_5.5.0_3.0_1726628331831.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/yelp_polarity_roberta_base_seed_3_en_5.5.0_3.0_1726628331831.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("yelp_polarity_roberta_base_seed_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("yelp_polarity_roberta_base_seed_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|yelp_polarity_roberta_base_seed_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|465.8 MB| + +## References + +https://huggingface.co/utahnlp/yelp_polarity_roberta-base_seed-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-yelp_polarity_roberta_large_seed_2_en.md b/docs/_posts/ahmedlone127/2024-09-18-yelp_polarity_roberta_large_seed_2_en.md new file mode 100644 index 00000000000000..6d49840b41aebb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-yelp_polarity_roberta_large_seed_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English yelp_polarity_roberta_large_seed_2 RoBertaForSequenceClassification from utahnlp +author: John Snow Labs +name: yelp_polarity_roberta_large_seed_2 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`yelp_polarity_roberta_large_seed_2` is a English model originally trained by utahnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/yelp_polarity_roberta_large_seed_2_en_5.5.0_3.0_1726649776701.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/yelp_polarity_roberta_large_seed_2_en_5.5.0_3.0_1726649776701.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("yelp_polarity_roberta_large_seed_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("yelp_polarity_roberta_large_seed_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|yelp_polarity_roberta_large_seed_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/utahnlp/yelp_polarity_roberta-large_seed-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-alberta_base_nbolton04_en.md b/docs/_posts/ahmedlone127/2024-09-19-alberta_base_nbolton04_en.md new file mode 100644 index 00000000000000..4918c17fd8fef0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-alberta_base_nbolton04_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English alberta_base_nbolton04 RoBertaForSequenceClassification from nbolton04 +author: John Snow Labs +name: alberta_base_nbolton04 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`alberta_base_nbolton04` is a English model originally trained by nbolton04. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/alberta_base_nbolton04_en_5.5.0_3.0_1726726447654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/alberta_base_nbolton04_en_5.5.0_3.0_1726726447654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("alberta_base_nbolton04","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("alberta_base_nbolton04", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|alberta_base_nbolton04| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|441.6 MB| + +## References + +https://huggingface.co/nbolton04/alberta_base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-angela_punc_diacritics_eval_en.md b/docs/_posts/ahmedlone127/2024-09-19-angela_punc_diacritics_eval_en.md new file mode 100644 index 00000000000000..fb490673920dfd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-angela_punc_diacritics_eval_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English angela_punc_diacritics_eval XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_punc_diacritics_eval +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_punc_diacritics_eval` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_punc_diacritics_eval_en_5.5.0_3.0_1726708862780.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_punc_diacritics_eval_en_5.5.0_3.0_1726708862780.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_punc_diacritics_eval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_punc_diacritics_eval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_punc_diacritics_eval| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_punc_diacritics_eval \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_pipeline_en.md new file mode 100644 index 00000000000000..8c3e2fb4c8f846 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_pipeline pipeline BertEmbeddings from mmcleige +author: John Snow Labs +name: arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_pipeline` is a English model originally trained by mmcleige. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_pipeline_en_5.5.0_3.0_1726744452421.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_pipeline_en_5.5.0_3.0_1726744452421.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arbovirus_bert_base_dylr_1e_5_n_10_bosnian_32_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.1 MB| + +## References + +https://huggingface.co/mmcleige/arbovirus_bert_base_DYLR.1e-5_N.10_BS.32 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_blbooks_cased_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_blbooks_cased_en.md new file mode 100644 index 00000000000000..dde3db1a234653 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_blbooks_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_blbooks_cased BertEmbeddings from bigscience-historical-texts +author: John Snow Labs +name: bert_base_blbooks_cased +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_blbooks_cased` is a English model originally trained by bigscience-historical-texts. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_blbooks_cased_en_5.5.0_3.0_1726705673902.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_blbooks_cased_en_5.5.0_3.0_1726705673902.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_blbooks_cased","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_blbooks_cased","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_blbooks_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|412.3 MB| + +## References + +https://huggingface.co/bigscience-historical-texts/bert-base-blbooks-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_multilingual_uncased_finetuned_hp_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_multilingual_uncased_finetuned_hp_pipeline_xx.md new file mode 100644 index 00000000000000..40cb46cb8ae0d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_multilingual_uncased_finetuned_hp_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_finetuned_hp_pipeline pipeline BertEmbeddings from rman-rahimi-29 +author: John Snow Labs +name: bert_base_multilingual_uncased_finetuned_hp_pipeline +date: 2024-09-19 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_finetuned_hp_pipeline` is a Multilingual model originally trained by rman-rahimi-29. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_finetuned_hp_pipeline_xx_5.5.0_3.0_1726731805654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_finetuned_hp_pipeline_xx_5.5.0_3.0_1726731805654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_uncased_finetuned_hp_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_uncased_finetuned_hp_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_finetuned_hp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|625.5 MB| + +## References + +https://huggingface.co/rman-rahimi-29/bert-base-multilingual-uncased-finetuned-hp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_spanish_analysis_app_questions_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_spanish_analysis_app_questions_en.md new file mode 100644 index 00000000000000..2f12e5eb2477f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_spanish_analysis_app_questions_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_spanish_analysis_app_questions BertForSequenceClassification from devdroide +author: John Snow Labs +name: bert_base_spanish_analysis_app_questions +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_analysis_app_questions` is a English model originally trained by devdroide. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_analysis_app_questions_en_5.5.0_3.0_1726770634994.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_analysis_app_questions_en_5.5.0_3.0_1726770634994.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_spanish_analysis_app_questions","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_spanish_analysis_app_questions", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_analysis_app_questions| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|411.9 MB| + +## References + +https://huggingface.co/devdroide/bert-base-spanish-analysis-app-questions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_ag_news_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_ag_news_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..fe7da2ff18168c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_ag_news_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_ag_news_finetuned_pipeline pipeline BertForSequenceClassification from odunola +author: John Snow Labs +name: bert_base_uncased_ag_news_finetuned_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ag_news_finetuned_pipeline` is a English model originally trained by odunola. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ag_news_finetuned_pipeline_en_5.5.0_3.0_1726739835962.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ag_news_finetuned_pipeline_en_5.5.0_3.0_1726739835962.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ag_news_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ag_news_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ag_news_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/odunola/bert-base-uncased-ag-news-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_squad_sonalh_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_squad_sonalh_en.md new file mode 100644 index 00000000000000..d611537fd1ba9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_squad_sonalh_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_squad_sonalh BertForQuestionAnswering from SonalH +author: John Snow Labs +name: bert_base_uncased_finetuned_squad_sonalh +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_squad_sonalh` is a English model originally trained by SonalH. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_squad_sonalh_en_5.5.0_3.0_1726765269370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_squad_sonalh_en_5.5.0_3.0_1726765269370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetuned_squad_sonalh","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetuned_squad_sonalh", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_squad_sonalh| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/SonalH/bert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_sclarge_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_sclarge_en.md new file mode 100644 index 00000000000000..805e93f3252bb7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_sclarge_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_sclarge BertEmbeddings from CambridgeMolecularEngineering +author: John Snow Labs +name: bert_base_uncased_sclarge +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_sclarge` is a English model originally trained by CambridgeMolecularEngineering. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_sclarge_en_5.5.0_3.0_1726734685884.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_sclarge_en_5.5.0_3.0_1726734685884.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_sclarge","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_sclarge","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_sclarge| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/CambridgeMolecularEngineering/bert-base-uncased-sclarge \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_distilled_multi_teacher_model_random_tweet_emotion_epoch7_alpha0_8_refined_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_distilled_multi_teacher_model_random_tweet_emotion_epoch7_alpha0_8_refined_pipeline_en.md new file mode 100644 index 00000000000000..1f27b5a33a73fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_distilled_multi_teacher_model_random_tweet_emotion_epoch7_alpha0_8_refined_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_distilled_multi_teacher_model_random_tweet_emotion_epoch7_alpha0_8_refined_pipeline pipeline DistilBertForSequenceClassification from ArafatBHossain +author: John Snow Labs +name: bert_distilled_multi_teacher_model_random_tweet_emotion_epoch7_alpha0_8_refined_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_distilled_multi_teacher_model_random_tweet_emotion_epoch7_alpha0_8_refined_pipeline` is a English model originally trained by ArafatBHossain. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_distilled_multi_teacher_model_random_tweet_emotion_epoch7_alpha0_8_refined_pipeline_en_5.5.0_3.0_1726741554074.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_distilled_multi_teacher_model_random_tweet_emotion_epoch7_alpha0_8_refined_pipeline_en_5.5.0_3.0_1726741554074.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_distilled_multi_teacher_model_random_tweet_emotion_epoch7_alpha0_8_refined_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_distilled_multi_teacher_model_random_tweet_emotion_epoch7_alpha0_8_refined_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_distilled_multi_teacher_model_random_tweet_emotion_epoch7_alpha0_8_refined_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ArafatBHossain/bert-distilled-multi_teacher_model_random_tweet_emotion_epoch7_alpha0.8_refined + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_racial_cross_validation_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_racial_cross_validation_pipeline_en.md new file mode 100644 index 00000000000000..5b4eace3a17ebe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_racial_cross_validation_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_racial_cross_validation_pipeline pipeline DistilBertForSequenceClassification from jamnik99 +author: John Snow Labs +name: bert_racial_cross_validation_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_racial_cross_validation_pipeline` is a English model originally trained by jamnik99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_racial_cross_validation_pipeline_en_5.5.0_3.0_1726719096016.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_racial_cross_validation_pipeline_en_5.5.0_3.0_1726719096016.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_racial_cross_validation_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_racial_cross_validation_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_racial_cross_validation_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jamnik99/BERT_racial_cross_validation + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_sxie3333_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_sxie3333_pipeline_en.md new file mode 100644 index 00000000000000..6b407364537898 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_sxie3333_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_sxie3333_pipeline pipeline BertForSequenceClassification from sxie3333 +author: John Snow Labs +name: bert_sxie3333_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sxie3333_pipeline` is a English model originally trained by sxie3333. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sxie3333_pipeline_en_5.5.0_3.0_1726706968327.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sxie3333_pipeline_en_5.5.0_3.0_1726706968327.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_sxie3333_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_sxie3333_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sxie3333_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/sxie3333/BERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bfsee_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bfsee_pipeline_en.md new file mode 100644 index 00000000000000..745d8ce323aa41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bfsee_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bfsee_pipeline pipeline DistilBertForSequenceClassification from talktojustintoday +author: John Snow Labs +name: bfsee_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bfsee_pipeline` is a English model originally trained by talktojustintoday. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bfsee_pipeline_en_5.5.0_3.0_1726763904228.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bfsee_pipeline_en_5.5.0_3.0_1726763904228.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bfsee_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bfsee_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bfsee_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.7 MB| + +## References + +https://huggingface.co/talktojustintoday/bfsee + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bge_large_repmus_matryoshka_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bge_large_repmus_matryoshka_pipeline_en.md new file mode 100644 index 00000000000000..e00658796151c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bge_large_repmus_matryoshka_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_repmus_matryoshka_pipeline pipeline BGEEmbeddings from tessimago +author: John Snow Labs +name: bge_large_repmus_matryoshka_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_repmus_matryoshka_pipeline` is a English model originally trained by tessimago. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_repmus_matryoshka_pipeline_en_5.5.0_3.0_1726720294653.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_repmus_matryoshka_pipeline_en_5.5.0_3.0_1726720294653.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_repmus_matryoshka_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_repmus_matryoshka_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_repmus_matryoshka_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/tessimago/bge-large-repmus-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-brwac_v1_3__checkpoint4_en.md b/docs/_posts/ahmedlone127/2024-09-19-brwac_v1_3__checkpoint4_en.md new file mode 100644 index 00000000000000..f0aebe37ca2c3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-brwac_v1_3__checkpoint4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English brwac_v1_3__checkpoint4 RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: brwac_v1_3__checkpoint4 +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`brwac_v1_3__checkpoint4` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/brwac_v1_3__checkpoint4_en_5.5.0_3.0_1726747028178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/brwac_v1_3__checkpoint4_en_5.5.0_3.0_1726747028178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("brwac_v1_3__checkpoint4","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("brwac_v1_3__checkpoint4","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|brwac_v1_3__checkpoint4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|298.7 MB| + +## References + +https://huggingface.co/eduagarcia-temp/brwac_v1_3__checkpoint4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-brwac_v1_5__checkpoint_17_62000_en.md b/docs/_posts/ahmedlone127/2024-09-19-brwac_v1_5__checkpoint_17_62000_en.md new file mode 100644 index 00000000000000..0254fdfb111cd3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-brwac_v1_5__checkpoint_17_62000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English brwac_v1_5__checkpoint_17_62000 RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: brwac_v1_5__checkpoint_17_62000 +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`brwac_v1_5__checkpoint_17_62000` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/brwac_v1_5__checkpoint_17_62000_en_5.5.0_3.0_1726778555961.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/brwac_v1_5__checkpoint_17_62000_en_5.5.0_3.0_1726778555961.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("brwac_v1_5__checkpoint_17_62000","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("brwac_v1_5__checkpoint_17_62000","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|brwac_v1_5__checkpoint_17_62000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.1 MB| + +## References + +https://huggingface.co/eduagarcia-temp/brwac_v1_5__checkpoint_17_62000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-brwac_v1_5__checkpoint_17_62000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-brwac_v1_5__checkpoint_17_62000_pipeline_en.md new file mode 100644 index 00000000000000..2709afa64033bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-brwac_v1_5__checkpoint_17_62000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English brwac_v1_5__checkpoint_17_62000_pipeline pipeline RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: brwac_v1_5__checkpoint_17_62000_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`brwac_v1_5__checkpoint_17_62000_pipeline` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/brwac_v1_5__checkpoint_17_62000_pipeline_en_5.5.0_3.0_1726778644134.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/brwac_v1_5__checkpoint_17_62000_pipeline_en_5.5.0_3.0_1726778644134.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("brwac_v1_5__checkpoint_17_62000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("brwac_v1_5__checkpoint_17_62000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|brwac_v1_5__checkpoint_17_62000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.1 MB| + +## References + +https://huggingface.co/eduagarcia-temp/brwac_v1_5__checkpoint_17_62000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_abdelrahman_hassan_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_abdelrahman_hassan_1_pipeline_en.md new file mode 100644 index 00000000000000..4ec4ca7648d23e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_abdelrahman_hassan_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_abdelrahman_hassan_1_pipeline pipeline DistilBertForSequenceClassification from Abdelrahman-Hassan-1 +author: John Snow Labs +name: burmese_awesome_model_abdelrahman_hassan_1_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_abdelrahman_hassan_1_pipeline` is a English model originally trained by Abdelrahman-Hassan-1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_abdelrahman_hassan_1_pipeline_en_5.5.0_3.0_1726741016534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_abdelrahman_hassan_1_pipeline_en_5.5.0_3.0_1726741016534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_abdelrahman_hassan_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_abdelrahman_hassan_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_abdelrahman_hassan_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Abdelrahman-Hassan-1/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_bazgha_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_bazgha_en.md new file mode 100644 index 00000000000000..0bd91346abe204 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_bazgha_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_bazgha DistilBertForSequenceClassification from bazgha +author: John Snow Labs +name: burmese_awesome_model_bazgha +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_bazgha` is a English model originally trained by bazgha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_bazgha_en_5.5.0_3.0_1726742602513.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_bazgha_en_5.5.0_3.0_1726742602513.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_bazgha","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_bazgha", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_bazgha| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bazgha/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_highcodger10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_highcodger10_pipeline_en.md new file mode 100644 index 00000000000000..ef73f79c1841ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_highcodger10_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_highcodger10_pipeline pipeline DistilBertForSequenceClassification from highcodger10 +author: John Snow Labs +name: burmese_awesome_model_highcodger10_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_highcodger10_pipeline` is a English model originally trained by highcodger10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_highcodger10_pipeline_en_5.5.0_3.0_1726704804062.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_highcodger10_pipeline_en_5.5.0_3.0_1726704804062.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_highcodger10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_highcodger10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_highcodger10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/highcodger10/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_philgrey_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_philgrey_en.md new file mode 100644 index 00000000000000..d06ebeaedfea39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_awesome_model_philgrey_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_philgrey DistilBertForSequenceClassification from philgrey +author: John Snow Labs +name: burmese_awesome_model_philgrey +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_philgrey` is a English model originally trained by philgrey. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_philgrey_en_5.5.0_3.0_1726741016052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_philgrey_en_5.5.0_3.0_1726741016052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_philgrey","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_philgrey", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_philgrey| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/philgrey/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-citation_polarity_roberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-19-citation_polarity_roberta_base_en.md new file mode 100644 index 00000000000000..8257043216a66c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-citation_polarity_roberta_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English citation_polarity_roberta_base RoBertaForSequenceClassification from coltekin +author: John Snow Labs +name: citation_polarity_roberta_base +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`citation_polarity_roberta_base` is a English model originally trained by coltekin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/citation_polarity_roberta_base_en_5.5.0_3.0_1726733629262.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/citation_polarity_roberta_base_en_5.5.0_3.0_1726733629262.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("citation_polarity_roberta_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("citation_polarity_roberta_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|citation_polarity_roberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|426.9 MB| + +## References + +https://huggingface.co/coltekin/citation-polarity-roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-ct_m1_complete_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-ct_m1_complete_pipeline_en.md new file mode 100644 index 00000000000000..43e988b8b748c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-ct_m1_complete_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ct_m1_complete_pipeline pipeline RoBertaEmbeddings from crisistransformers +author: John Snow Labs +name: ct_m1_complete_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ct_m1_complete_pipeline` is a English model originally trained by crisistransformers. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ct_m1_complete_pipeline_en_5.5.0_3.0_1726746987857.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ct_m1_complete_pipeline_en_5.5.0_3.0_1726746987857.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ct_m1_complete_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ct_m1_complete_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ct_m1_complete_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|504.7 MB| + +## References + +https://huggingface.co/crisistransformers/CT-M1-Complete + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-ct_m2_onelook_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-ct_m2_onelook_pipeline_en.md new file mode 100644 index 00000000000000..ed6264cf9c0445 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-ct_m2_onelook_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ct_m2_onelook_pipeline pipeline RoBertaEmbeddings from crisistransformers +author: John Snow Labs +name: ct_m2_onelook_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ct_m2_onelook_pipeline` is a English model originally trained by crisistransformers. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ct_m2_onelook_pipeline_en_5.5.0_3.0_1726747259647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ct_m2_onelook_pipeline_en_5.5.0_3.0_1726747259647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ct_m2_onelook_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ct_m2_onelook_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ct_m2_onelook_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/crisistransformers/CT-M2-OneLook + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-cv9_special_batch8_lr6_small_pipeline_id.md b/docs/_posts/ahmedlone127/2024-09-19-cv9_special_batch8_lr6_small_pipeline_id.md new file mode 100644 index 00000000000000..d7478ddf46acae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-cv9_special_batch8_lr6_small_pipeline_id.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Indonesian cv9_special_batch8_lr6_small_pipeline pipeline WhisperForCTC from TheRains +author: John Snow Labs +name: cv9_special_batch8_lr6_small_pipeline +date: 2024-09-19 +tags: [id, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cv9_special_batch8_lr6_small_pipeline` is a Indonesian model originally trained by TheRains. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cv9_special_batch8_lr6_small_pipeline_id_5.5.0_3.0_1726756893012.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cv9_special_batch8_lr6_small_pipeline_id_5.5.0_3.0_1726756893012.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cv9_special_batch8_lr6_small_pipeline", lang = "id") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cv9_special_batch8_lr6_small_pipeline", lang = "id") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cv9_special_batch8_lr6_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|id| +|Size:|1.7 GB| + +## References + +https://huggingface.co/TheRains/cv9-special-batch8-lr6-small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-cybert_our_data_en.md b/docs/_posts/ahmedlone127/2024-09-19-cybert_our_data_en.md new file mode 100644 index 00000000000000..16b36c53c34ca8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-cybert_our_data_en.md @@ -0,0 +1,100 @@ +--- +layout: model +title: English cybert_our_data RoBertaForTokenClassification from anonymouspd +author: John Snow Labs +name: cybert_our_data +date: 2024-09-19 +tags: [roberta, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cybert_our_data` is a English model originally trained by anonymouspd. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cybert_our_data_en_5.5.0_3.0_1726729914290.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cybert_our_data_en_5.5.0_3.0_1726729914290.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols(["document"]) \ + .setOutputCol("token") + + +tokenClassifier = RoBertaForTokenClassification.pretrained("cybert_our_data","en") \ + .setInputCols(["document","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = Tokenizer() \ + .setInputCols(Array("document")) \ + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification + .pretrained("cybert_our_data", "en") + .setInputCols(Array("document","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cybert_our_data| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|311.4 MB| + +## References + +References + +https://huggingface.co/anonymouspd/CyBERT-our-data \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-darija_test8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-darija_test8_pipeline_en.md new file mode 100644 index 00000000000000..f6569d906923f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-darija_test8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English darija_test8_pipeline pipeline BertForSequenceClassification from smerchi +author: John Snow Labs +name: darija_test8_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`darija_test8_pipeline` is a English model originally trained by smerchi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/darija_test8_pipeline_en_5.5.0_3.0_1726736452528.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/darija_test8_pipeline_en_5.5.0_3.0_1726736452528.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("darija_test8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("darija_test8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|darija_test8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|553.7 MB| + +## References + +https://huggingface.co/smerchi/darija_test8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distil_news_finetune2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distil_news_finetune2_pipeline_en.md new file mode 100644 index 00000000000000..9891c81e994e65 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distil_news_finetune2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distil_news_finetune2_pipeline pipeline DistilBertForSequenceClassification from anggari +author: John Snow Labs +name: distil_news_finetune2_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distil_news_finetune2_pipeline` is a English model originally trained by anggari. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distil_news_finetune2_pipeline_en_5.5.0_3.0_1726763643349.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distil_news_finetune2_pipeline_en_5.5.0_3.0_1726763643349.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distil_news_finetune2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distil_news_finetune2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distil_news_finetune2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/anggari/distil_news_finetune2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_anandharaju_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_anandharaju_en.md new file mode 100644 index 00000000000000..bfd2cdb1205482 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_anandharaju_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_anandharaju DistilBertForQuestionAnswering from Anandharaju +author: John Snow Labs +name: distilbert_anandharaju +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_anandharaju` is a English model originally trained by Anandharaju. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_anandharaju_en_5.5.0_3.0_1726786046576.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_anandharaju_en_5.5.0_3.0_1726786046576.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_anandharaju","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_anandharaju", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_anandharaju| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Anandharaju/distilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_anandharaju_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_anandharaju_pipeline_en.md new file mode 100644 index 00000000000000..2d0780cd1fab4a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_anandharaju_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_anandharaju_pipeline pipeline DistilBertForQuestionAnswering from Anandharaju +author: John Snow Labs +name: distilbert_anandharaju_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_anandharaju_pipeline` is a English model originally trained by Anandharaju. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_anandharaju_pipeline_en_5.5.0_3.0_1726786057907.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_anandharaju_pipeline_en_5.5.0_3.0_1726786057907.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_anandharaju_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_anandharaju_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_anandharaju_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Anandharaju/distilbert + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_multilingual_cased_regression_finetuned_news_all_xx.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_multilingual_cased_regression_finetuned_news_all_xx.md new file mode 100644 index 00000000000000..9f15f085547009 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_multilingual_cased_regression_finetuned_news_all_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual distilbert_base_multilingual_cased_regression_finetuned_news_all DistilBertForSequenceClassification from Mou11209203 +author: John Snow Labs +name: distilbert_base_multilingual_cased_regression_finetuned_news_all +date: 2024-09-19 +tags: [xx, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_multilingual_cased_regression_finetuned_news_all` is a Multilingual model originally trained by Mou11209203. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_regression_finetuned_news_all_xx_5.5.0_3.0_1726743601435.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_regression_finetuned_news_all_xx_5.5.0_3.0_1726743601435.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_multilingual_cased_regression_finetuned_news_all","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_multilingual_cased_regression_finetuned_news_all", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_multilingual_cased_regression_finetuned_news_all| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|507.7 MB| + +## References + +https://huggingface.co/Mou11209203/distilbert-base-multilingual-cased_regression_finetuned_news_all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_bentleyjeon_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_bentleyjeon_pipeline_en.md new file mode 100644 index 00000000000000..c0704c45deb24b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_bentleyjeon_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_bentleyjeon_pipeline pipeline DistilBertForSequenceClassification from bentleyjeon +author: John Snow Labs +name: distilbert_base_uncased_finetuned_bentleyjeon_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_bentleyjeon_pipeline` is a English model originally trained by bentleyjeon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_bentleyjeon_pipeline_en_5.5.0_3.0_1726704404450.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_bentleyjeon_pipeline_en_5.5.0_3.0_1726704404450.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_bentleyjeon_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_bentleyjeon_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_bentleyjeon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bentleyjeon/distilbert-base-uncased-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_abushady5_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_abushady5_en.md new file mode 100644 index 00000000000000..602bb690c36b5a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_cola_abushady5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_abushady5 DistilBertForSequenceClassification from Abushady5 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_abushady5 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_abushady5` is a English model originally trained by Abushady5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_abushady5_en_5.5.0_3.0_1726719324321.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_abushady5_en_5.5.0_3.0_1726719324321.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_abushady5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_abushady5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_abushady5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Abushady5/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_bingyizh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_bingyizh_pipeline_en.md new file mode 100644 index 00000000000000..cf02c509bf73cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_bingyizh_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_bingyizh_pipeline pipeline DistilBertForSequenceClassification from bingyizh +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_bingyizh_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_bingyizh_pipeline` is a English model originally trained by bingyizh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_bingyizh_pipeline_en_5.5.0_3.0_1726704884403.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_bingyizh_pipeline_en_5.5.0_3.0_1726704884403.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_bingyizh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_bingyizh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_bingyizh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bingyizh/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_nickrth_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_nickrth_pipeline_en.md new file mode 100644 index 00000000000000..08a792d37f3d47 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_nickrth_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_nickrth_pipeline pipeline DistilBertForSequenceClassification from nickrth +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_nickrth_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_nickrth_pipeline` is a English model originally trained by nickrth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_nickrth_pipeline_en_5.5.0_3.0_1726719521201.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_nickrth_pipeline_en_5.5.0_3.0_1726719521201.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_nickrth_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_nickrth_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_nickrth_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/nickrth/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotions_jjfumero_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotions_jjfumero_pipeline_en.md new file mode 100644 index 00000000000000..730220966a7595 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotions_jjfumero_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotions_jjfumero_pipeline pipeline DistilBertForSequenceClassification from jjfumero +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotions_jjfumero_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotions_jjfumero_pipeline` is a English model originally trained by jjfumero. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_jjfumero_pipeline_en_5.5.0_3.0_1726764128924.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_jjfumero_pipeline_en_5.5.0_3.0_1726764128924.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotions_jjfumero_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotions_jjfumero_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotions_jjfumero_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jjfumero/distilbert-base-uncased-finetuned-emotions + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_amna1015_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_amna1015_en.md new file mode 100644 index 00000000000000..90ee657c18f5c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_amna1015_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_amna1015 DistilBertForQuestionAnswering from Amna1015 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_amna1015 +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_amna1015` is a English model originally trained by Amna1015. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_amna1015_en_5.5.0_3.0_1726727561754.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_amna1015_en_5.5.0_3.0_1726727561754.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_amna1015","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_amna1015", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_amna1015| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Amna1015/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_jmoonyoung_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_jmoonyoung_en.md new file mode 100644 index 00000000000000..b5c366d1c6843a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_jmoonyoung_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_jmoonyoung DistilBertForQuestionAnswering from jmoonyoung +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_jmoonyoung +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_jmoonyoung` is a English model originally trained by jmoonyoung. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_jmoonyoung_en_5.5.0_3.0_1726748417741.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_jmoonyoung_en_5.5.0_3.0_1726748417741.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_jmoonyoung","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_jmoonyoung", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_jmoonyoung| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/jmoonyoung/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_msamahero_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_msamahero_en.md new file mode 100644 index 00000000000000..4bf08773d4fef9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_msamahero_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_msamahero DistilBertForQuestionAnswering from msamahero +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_msamahero +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_msamahero` is a English model originally trained by msamahero. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_msamahero_en_5.5.0_3.0_1726727855733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_msamahero_en_5.5.0_3.0_1726727855733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_msamahero","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_msamahero", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_msamahero| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/msamahero/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_tuner23652_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_tuner23652_en.md new file mode 100644 index 00000000000000..343a83b8167f09 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_squad_tuner23652_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_tuner23652 DistilBertForQuestionAnswering from tuner23652 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_tuner23652 +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_tuner23652` is a English model originally trained by tuner23652. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_tuner23652_en_5.5.0_3.0_1726766673187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_tuner23652_en_5.5.0_3.0_1726766673187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_tuner23652","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_tuner23652", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_tuner23652| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/tuner23652/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_fold_4_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_fold_4_en.md new file mode 100644 index 00000000000000..a89f887330f10d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_fold_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_fold_4 DistilBertForSequenceClassification from research-dump +author: John Snow Labs +name: distilbert_base_uncased_fold_4 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_fold_4` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_fold_4_en_5.5.0_3.0_1726743731338.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_fold_4_en_5.5.0_3.0_1726743731338.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_fold_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_fold_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_fold_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/research-dump/distilbert-base-uncased_fold_4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_small_talk_zphr_0st_ut12ut1_plain_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_small_talk_zphr_0st_ut12ut1_plain_simsp_en.md new file mode 100644 index 00000000000000..c68a37bd17f148 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_small_talk_zphr_0st_ut12ut1_plain_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_small_talk_zphr_0st_ut12ut1_plain_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_small_talk_zphr_0st_ut12ut1_plain_simsp +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_small_talk_zphr_0st_ut12ut1_plain_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_small_talk_zphr_0st_ut12ut1_plain_simsp_en_5.5.0_3.0_1726744178540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_small_talk_zphr_0st_ut12ut1_plain_simsp_en_5.5.0_3.0_1726744178540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_small_talk_zphr_0st_ut12ut1_plain_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_small_talk_zphr_0st_ut12ut1_plain_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_small_talk_zphr_0st_ut12ut1_plain_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_small_talk_zphr_0st_ut12ut1_plain_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_small_talk_zphr_2st_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_small_talk_zphr_2st_en.md new file mode 100644 index 00000000000000..bd0f5c5a5caf4c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_small_talk_zphr_2st_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_small_talk_zphr_2st DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_small_talk_zphr_2st +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_small_talk_zphr_2st` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_small_talk_zphr_2st_en_5.5.0_3.0_1726764016565.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_small_talk_zphr_2st_en_5.5.0_3.0_1726764016565.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_small_talk_zphr_2st","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_small_talk_zphr_2st", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_small_talk_zphr_2st| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_small_talk_zphr_2st \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_squad2_pruned_p45_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_squad2_pruned_p45_pipeline_en.md new file mode 100644 index 00000000000000..a6e6bb34aa8808 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_squad2_pruned_p45_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_pruned_p45_pipeline pipeline DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_pruned_p45_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_pruned_p45_pipeline` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_pruned_p45_pipeline_en_5.5.0_3.0_1726727895991.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_pruned_p45_pipeline_en_5.5.0_3.0_1726727895991.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_squad2_pruned_p45_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_squad2_pruned_p45_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_pruned_p45_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|192.6 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-pruned-p45 + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_emotion_bilalinan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_emotion_bilalinan_pipeline_en.md new file mode 100644 index 00000000000000..428ca82b22e92f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_emotion_bilalinan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_emotion_bilalinan_pipeline pipeline DistilBertForSequenceClassification from bilalinan +author: John Snow Labs +name: distilbert_emotion_bilalinan_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_bilalinan_pipeline` is a English model originally trained by bilalinan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_bilalinan_pipeline_en_5.5.0_3.0_1726704293519.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_bilalinan_pipeline_en_5.5.0_3.0_1726704293519.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_emotion_bilalinan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_emotion_bilalinan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_bilalinan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bilalinan/distilbert-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9_en.md new file mode 100644 index 00000000000000..0fc773afe37edf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9 DistilBertForSequenceClassification from aniket-jain-9 +author: John Snow Labs +name: distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9` is a English model originally trained by aniket-jain-9. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9_en_5.5.0_3.0_1726741119425.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9_en_5.5.0_3.0_1726741119425.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_lora_finetuned_merged_imdb_sentiment_aniket_jain_9| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aniket-jain-9/distilbert-lora-finetuned-merged-imdb-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_sanskrit_saskta_glue_experiment_mnli_192_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_sanskrit_saskta_glue_experiment_mnli_192_en.md new file mode 100644 index 00000000000000..fd58ea0eecc66c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_sanskrit_saskta_glue_experiment_mnli_192_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_mnli_192 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_mnli_192 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_mnli_192` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_mnli_192_en_5.5.0_3.0_1726742896953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_mnli_192_en_5.5.0_3.0_1726742896953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_mnli_192","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_mnli_192", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_mnli_192| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|52.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_mnli_192 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_turk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_turk_pipeline_en.md new file mode 100644 index 00000000000000..de9087a89efb92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_turk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_turk_pipeline pipeline DistilBertForSequenceClassification from alionder +author: John Snow Labs +name: distilbert_turk_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_turk_pipeline` is a English model originally trained by alionder. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_turk_pipeline_en_5.5.0_3.0_1726743014804.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_turk_pipeline_en_5.5.0_3.0_1726743014804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_turk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_turk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_turk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|254.1 MB| + +## References + +https://huggingface.co/alionder/distilbert_turk + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilkobert_ep2_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilkobert_ep2_en.md new file mode 100644 index 00000000000000..99153d4625f89c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilkobert_ep2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilkobert_ep2 DistilBertForSequenceClassification from yeye776 +author: John Snow Labs +name: distilkobert_ep2 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilkobert_ep2` is a English model originally trained by yeye776. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilkobert_ep2_en_5.5.0_3.0_1726763333004.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilkobert_ep2_en_5.5.0_3.0_1726763333004.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilkobert_ep2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilkobert_ep2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilkobert_ep2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|106.5 MB| + +## References + +https://huggingface.co/yeye776/DistilKoBERT-ep2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilroberta_base_ft_summonerschool_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilroberta_base_ft_summonerschool_en.md new file mode 100644 index 00000000000000..f8196531fc33a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilroberta_base_ft_summonerschool_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_ft_summonerschool RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_summonerschool +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_summonerschool` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_summonerschool_en_5.5.0_3.0_1726747572996.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_summonerschool_en_5.5.0_3.0_1726747572996.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_summonerschool","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_summonerschool","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_summonerschool| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-summonerschool \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilroberta_base_ft_summonerschool_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilroberta_base_ft_summonerschool_pipeline_en.md new file mode 100644 index 00000000000000..86ec6b4b9c7aa4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilroberta_base_ft_summonerschool_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_ft_summonerschool_pipeline pipeline RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_summonerschool_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_summonerschool_pipeline` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_summonerschool_pipeline_en_5.5.0_3.0_1726747588720.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_summonerschool_pipeline_en_5.5.0_3.0_1726747588720.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_ft_summonerschool_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_ft_summonerschool_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_summonerschool_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-summonerschool + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilroberta_rb156k_opt15_ep40_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilroberta_rb156k_opt15_ep40_en.md new file mode 100644 index 00000000000000..382dd759e0adef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilroberta_rb156k_opt15_ep40_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_rb156k_opt15_ep40 RoBertaEmbeddings from judy93536 +author: John Snow Labs +name: distilroberta_rb156k_opt15_ep40 +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_rb156k_opt15_ep40` is a English model originally trained by judy93536. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_rb156k_opt15_ep40_en_5.5.0_3.0_1726778375556.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_rb156k_opt15_ep40_en_5.5.0_3.0_1726778375556.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_rb156k_opt15_ep40","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_rb156k_opt15_ep40","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_rb156k_opt15_ep40| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|305.9 MB| + +## References + +https://huggingface.co/judy93536/distilroberta-rb156k-opt15-ep40 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_en.md b/docs/_posts/ahmedlone127/2024-09-19-emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_en.md new file mode 100644 index 00000000000000..55bc85b1f3ba7c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3 RoBertaForSequenceClassification from Abdelrahman-Rezk +author: John Snow Labs +name: emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3` is a English model originally trained by Abdelrahman-Rezk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_en_5.5.0_3.0_1726725985729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3_en_5.5.0_3.0_1726725985729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotion_english_distilroberta_base_fine_tuned_for_amazon_reviews_english_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/Abdelrahman-Rezk/emotion-english-distilroberta-base-fine_tuned_for_amazon_reviews_english_3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-fin_sentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-fin_sentiment_pipeline_en.md new file mode 100644 index 00000000000000..2027739d179ef7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-fin_sentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fin_sentiment_pipeline pipeline XlmRoBertaForSequenceClassification from thaidv96 +author: John Snow Labs +name: fin_sentiment_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fin_sentiment_pipeline` is a English model originally trained by thaidv96. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fin_sentiment_pipeline_en_5.5.0_3.0_1726752848555.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fin_sentiment_pipeline_en_5.5.0_3.0_1726752848555.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fin_sentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fin_sentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fin_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|847.9 MB| + +## References + +https://huggingface.co/thaidv96/fin-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-financial_sentiment_distilbert_en.md b/docs/_posts/ahmedlone127/2024-09-19-financial_sentiment_distilbert_en.md new file mode 100644 index 00000000000000..acd54be0d13320 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-financial_sentiment_distilbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English financial_sentiment_distilbert DistilBertForSequenceClassification from runaksh +author: John Snow Labs +name: financial_sentiment_distilbert +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`financial_sentiment_distilbert` is a English model originally trained by runaksh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/financial_sentiment_distilbert_en_5.5.0_3.0_1726704648580.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/financial_sentiment_distilbert_en_5.5.0_3.0_1726704648580.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("financial_sentiment_distilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("financial_sentiment_distilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|financial_sentiment_distilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/runaksh/financial_sentiment_distilBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_distillbert_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_distillbert_imdb_en.md new file mode 100644 index 00000000000000..a2e090fa367f30 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_distillbert_imdb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_distillbert_imdb DistilBertForSequenceClassification from kaustavbhattacharjee +author: John Snow Labs +name: finetuning_distillbert_imdb +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_distillbert_imdb` is a English model originally trained by kaustavbhattacharjee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_distillbert_imdb_en_5.5.0_3.0_1726742454117.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_distillbert_imdb_en_5.5.0_3.0_1726742454117.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_distillbert_imdb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_distillbert_imdb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_distillbert_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kaustavbhattacharjee/finetuning-DistillBERT-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_abrario_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_abrario_en.md new file mode 100644 index 00000000000000..e99c664662445b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_abrario_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_abrario DistilBertForSequenceClassification from abrario +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_abrario +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_abrario` is a English model originally trained by abrario. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_abrario_en_5.5.0_3.0_1726719150321.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_abrario_en_5.5.0_3.0_1726719150321.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_abrario","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_abrario", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_abrario| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/abrario/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_freedino_en.md b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_freedino_en.md new file mode 100644 index 00000000000000..3759fd53a7d655 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-finetuning_sentiment_model_3000_samples_freedino_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_freedino DistilBertForSequenceClassification from Freedino +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_freedino +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_freedino` is a English model originally trained by Freedino. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_freedino_en_5.5.0_3.0_1726719194691.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_freedino_en_5.5.0_3.0_1726719194691.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_freedino","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_freedino", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_freedino| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Freedino/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-ft_10m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-ft_10m_pipeline_en.md new file mode 100644 index 00000000000000..ab864759de1378 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-ft_10m_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English ft_10m_pipeline pipeline WhisperForCTC from xuliu15 +author: John Snow Labs +name: ft_10m_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ft_10m_pipeline` is a English model originally trained by xuliu15. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ft_10m_pipeline_en_5.5.0_3.0_1726716384619.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ft_10m_pipeline_en_5.5.0_3.0_1726716384619.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ft_10m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ft_10m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ft_10m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/xuliu15/FT-10m + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-jerteh355sentneg2_en.md b/docs/_posts/ahmedlone127/2024-09-19-jerteh355sentneg2_en.md new file mode 100644 index 00000000000000..4f08727622be91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-jerteh355sentneg2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English jerteh355sentneg2 RoBertaForSequenceClassification from Tanor +author: John Snow Labs +name: jerteh355sentneg2 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jerteh355sentneg2` is a English model originally trained by Tanor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jerteh355sentneg2_en_5.5.0_3.0_1726732599344.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jerteh355sentneg2_en_5.5.0_3.0_1726732599344.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("jerteh355sentneg2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("jerteh355sentneg2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jerteh355sentneg2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Tanor/Jerteh355SENTNEG2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-jerteh355sentneg2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-jerteh355sentneg2_pipeline_en.md new file mode 100644 index 00000000000000..cbc67ccd806f28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-jerteh355sentneg2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English jerteh355sentneg2_pipeline pipeline RoBertaForSequenceClassification from Tanor +author: John Snow Labs +name: jerteh355sentneg2_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jerteh355sentneg2_pipeline` is a English model originally trained by Tanor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jerteh355sentneg2_pipeline_en_5.5.0_3.0_1726732670152.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jerteh355sentneg2_pipeline_en_5.5.0_3.0_1726732670152.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("jerteh355sentneg2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("jerteh355sentneg2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jerteh355sentneg2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Tanor/Jerteh355SENTNEG2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-legal_longformer_base_8192_spanish_en.md b/docs/_posts/ahmedlone127/2024-09-19-legal_longformer_base_8192_spanish_en.md new file mode 100644 index 00000000000000..85132d8fb3ed96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-legal_longformer_base_8192_spanish_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English legal_longformer_base_8192_spanish RoBertaEmbeddings from clibrain +author: John Snow Labs +name: legal_longformer_base_8192_spanish +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_longformer_base_8192_spanish` is a English model originally trained by clibrain. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_longformer_base_8192_spanish_en_5.5.0_3.0_1726778162217.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_longformer_base_8192_spanish_en_5.5.0_3.0_1726778162217.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("legal_longformer_base_8192_spanish","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("legal_longformer_base_8192_spanish","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_longformer_base_8192_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|490.6 MB| + +## References + +https://huggingface.co/clibrain/legal-longformer-base-8192-spanish \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-mental_health_model_ilabutk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-mental_health_model_ilabutk_pipeline_en.md new file mode 100644 index 00000000000000..9c8ac969b957fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-mental_health_model_ilabutk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mental_health_model_ilabutk_pipeline pipeline DistilBertForSequenceClassification from iLabUtk +author: John Snow Labs +name: mental_health_model_ilabutk_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mental_health_model_ilabutk_pipeline` is a English model originally trained by iLabUtk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mental_health_model_ilabutk_pipeline_en_5.5.0_3.0_1726743084570.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mental_health_model_ilabutk_pipeline_en_5.5.0_3.0_1726743084570.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mental_health_model_ilabutk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mental_health_model_ilabutk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mental_health_model_ilabutk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/iLabUtk/mental_health_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-model_name_jenniferlimpopo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-model_name_jenniferlimpopo_pipeline_en.md new file mode 100644 index 00000000000000..d55aa05c63638d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-model_name_jenniferlimpopo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_name_jenniferlimpopo_pipeline pipeline DistilBertForSequenceClassification from Jenniferlimpopo +author: John Snow Labs +name: model_name_jenniferlimpopo_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_name_jenniferlimpopo_pipeline` is a English model originally trained by Jenniferlimpopo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_name_jenniferlimpopo_pipeline_en_5.5.0_3.0_1726719102927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_name_jenniferlimpopo_pipeline_en_5.5.0_3.0_1726719102927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_name_jenniferlimpopo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_name_jenniferlimpopo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_name_jenniferlimpopo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Jenniferlimpopo/model_name + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-n_roberta_imdb_padding90model_en.md b/docs/_posts/ahmedlone127/2024-09-19-n_roberta_imdb_padding90model_en.md new file mode 100644 index 00000000000000..c16da52f63a146 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-n_roberta_imdb_padding90model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_roberta_imdb_padding90model RoBertaForSequenceClassification from Realgon +author: John Snow Labs +name: n_roberta_imdb_padding90model +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_roberta_imdb_padding90model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_roberta_imdb_padding90model_en_5.5.0_3.0_1726780472885.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_roberta_imdb_padding90model_en_5.5.0_3.0_1726780472885.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("n_roberta_imdb_padding90model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("n_roberta_imdb_padding90model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_roberta_imdb_padding90model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|463.2 MB| + +## References + +https://huggingface.co/Realgon/N_roberta_imdb_padding90model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-ndd_phoenix_test_content_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-ndd_phoenix_test_content_pipeline_en.md new file mode 100644 index 00000000000000..c7fff2529aebff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-ndd_phoenix_test_content_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ndd_phoenix_test_content_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: ndd_phoenix_test_content_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ndd_phoenix_test_content_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ndd_phoenix_test_content_pipeline_en_5.5.0_3.0_1726763660569.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ndd_phoenix_test_content_pipeline_en_5.5.0_3.0_1726763660569.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ndd_phoenix_test_content_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ndd_phoenix_test_content_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ndd_phoenix_test_content_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/NDD-phoenix_test-content + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-nepal_bhasa_bert_en.md b/docs/_posts/ahmedlone127/2024-09-19-nepal_bhasa_bert_en.md new file mode 100644 index 00000000000000..a84c04a4cb70da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-nepal_bhasa_bert_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English nepal_bhasa_bert BertEmbeddings from onlydj96 +author: John Snow Labs +name: nepal_bhasa_bert +date: 2024-09-19 +tags: [bert, en, open_source, fill_mask, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_bert` is a English model originally trained by onlydj96. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_bert_en_5.5.0_3.0_1726705482586.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_bert_en_5.5.0_3.0_1726705482586.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +embeddings =BertEmbeddings.pretrained("nepal_bhasa_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([document_assembler, embeddings]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val embeddings = BertEmbeddings + .pretrained("nepal_bhasa_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(document_assembler, embeddings)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|403.6 MB| + +## References + +References + +https://huggingface.co/onlydj96/new_bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-nepal_bhasa_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-nepal_bhasa_bert_pipeline_en.md new file mode 100644 index 00000000000000..24462ddb197d9c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-nepal_bhasa_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nepal_bhasa_bert_pipeline pipeline BertEmbeddings from searchfind +author: John Snow Labs +name: nepal_bhasa_bert_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_bert_pipeline` is a English model originally trained by searchfind. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_bert_pipeline_en_5.5.0_3.0_1726705501728.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_bert_pipeline_en_5.5.0_3.0_1726705501728.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nepal_bhasa_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nepal_bhasa_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/searchfind/New_BERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-ner_chunkyun_en.md b/docs/_posts/ahmedlone127/2024-09-19-ner_chunkyun_en.md new file mode 100644 index 00000000000000..8f1bb1ea2ddd6b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-ner_chunkyun_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_chunkyun RoBertaForTokenClassification from Chunkyun +author: John Snow Labs +name: ner_chunkyun +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_chunkyun` is a English model originally trained by Chunkyun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_chunkyun_en_5.5.0_3.0_1726730636226.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_chunkyun_en_5.5.0_3.0_1726730636226.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("ner_chunkyun","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("ner_chunkyun", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_chunkyun| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|465.0 MB| + +## References + +https://huggingface.co/Chunkyun/ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-nerd_nerd_random1_seed2_twitter_roberta_base_2022_154m_en.md b/docs/_posts/ahmedlone127/2024-09-19-nerd_nerd_random1_seed2_twitter_roberta_base_2022_154m_en.md new file mode 100644 index 00000000000000..5a618163b859f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-nerd_nerd_random1_seed2_twitter_roberta_base_2022_154m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nerd_nerd_random1_seed2_twitter_roberta_base_2022_154m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: nerd_nerd_random1_seed2_twitter_roberta_base_2022_154m +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerd_nerd_random1_seed2_twitter_roberta_base_2022_154m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerd_nerd_random1_seed2_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1726750787953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerd_nerd_random1_seed2_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1726750787953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("nerd_nerd_random1_seed2_twitter_roberta_base_2022_154m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("nerd_nerd_random1_seed2_twitter_roberta_base_2022_154m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerd_nerd_random1_seed2_twitter_roberta_base_2022_154m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/tweettemposhift/nerd-nerd_random1_seed2-twitter-roberta-base-2022-154m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-nsina_category_sinbert_small_si.md b/docs/_posts/ahmedlone127/2024-09-19-nsina_category_sinbert_small_si.md new file mode 100644 index 00000000000000..7a6228bbee7e54 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-nsina_category_sinbert_small_si.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Sinhala, Sinhalese nsina_category_sinbert_small RoBertaForSequenceClassification from sinhala-nlp +author: John Snow Labs +name: nsina_category_sinbert_small +date: 2024-09-19 +tags: [si, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: si +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nsina_category_sinbert_small` is a Sinhala, Sinhalese model originally trained by sinhala-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nsina_category_sinbert_small_si_5.5.0_3.0_1726750932592.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nsina_category_sinbert_small_si_5.5.0_3.0_1726750932592.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("nsina_category_sinbert_small","si") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("nsina_category_sinbert_small", "si") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nsina_category_sinbert_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|si| +|Size:|249.4 MB| + +## References + +https://huggingface.co/sinhala-nlp/NSINA-Category-sinbert-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-pmp_h768_zh.md b/docs/_posts/ahmedlone127/2024-09-19-pmp_h768_zh.md new file mode 100644 index 00000000000000..98435d00f640d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-pmp_h768_zh.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Chinese pmp_h768 BertForTokenClassification from rickltt +author: John Snow Labs +name: pmp_h768 +date: 2024-09-19 +tags: [zh, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pmp_h768` is a Chinese model originally trained by rickltt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pmp_h768_zh_5.5.0_3.0_1726771429597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pmp_h768_zh_5.5.0_3.0_1726771429597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("pmp_h768","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("pmp_h768", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pmp_h768| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|zh| +|Size:|221.9 MB| + +## References + +https://huggingface.co/rickltt/pmp-h768 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-projectsentimentalanalysis_en.md b/docs/_posts/ahmedlone127/2024-09-19-projectsentimentalanalysis_en.md new file mode 100644 index 00000000000000..fcc5fbe4cbed39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-projectsentimentalanalysis_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English projectsentimentalanalysis DistilBertForSequenceClassification from Stevenchee +author: John Snow Labs +name: projectsentimentalanalysis +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`projectsentimentalanalysis` is a English model originally trained by Stevenchee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/projectsentimentalanalysis_en_5.5.0_3.0_1726744065305.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/projectsentimentalanalysis_en_5.5.0_3.0_1726744065305.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("projectsentimentalanalysis","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("projectsentimentalanalysis", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|projectsentimentalanalysis| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|248.4 MB| + +## References + +https://huggingface.co/Stevenchee/projectsentimentalanalysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-ptcrawl_base_v2_5__checkpoint_2_26000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-ptcrawl_base_v2_5__checkpoint_2_26000_pipeline_en.md new file mode 100644 index 00000000000000..a2238971258f95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-ptcrawl_base_v2_5__checkpoint_2_26000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ptcrawl_base_v2_5__checkpoint_2_26000_pipeline pipeline RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: ptcrawl_base_v2_5__checkpoint_2_26000_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ptcrawl_base_v2_5__checkpoint_2_26000_pipeline` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ptcrawl_base_v2_5__checkpoint_2_26000_pipeline_en_5.5.0_3.0_1726749467407.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ptcrawl_base_v2_5__checkpoint_2_26000_pipeline_en_5.5.0_3.0_1726749467407.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ptcrawl_base_v2_5__checkpoint_2_26000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ptcrawl_base_v2_5__checkpoint_2_26000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ptcrawl_base_v2_5__checkpoint_2_26000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|296.6 MB| + +## References + +https://huggingface.co/eduagarcia-temp/ptcrawl_base-v2_5__checkpoint_2_26000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_aptner_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_aptner_en.md new file mode 100644 index 00000000000000..6cefcadf352f6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_aptner_en.md @@ -0,0 +1,100 @@ +--- +layout: model +title: English roberta_aptner RoBertaForTokenClassification from anonymouspd +author: John Snow Labs +name: roberta_aptner +date: 2024-09-19 +tags: [roberta, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_aptner` is a English model originally trained by anonymouspd. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_aptner_en_5.5.0_3.0_1726731172548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_aptner_en_5.5.0_3.0_1726731172548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols(["document"]) \ + .setOutputCol("token") + + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_aptner","en") \ + .setInputCols(["document","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = Tokenizer() \ + .setInputCols(Array("document")) \ + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification + .pretrained("roberta_aptner", "en") + .setInputCols(Array("document","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_aptner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|430.3 MB| + +## References + +References + +https://huggingface.co/anonymouspd/RoBERTa-APTNER \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_epoch_61_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_epoch_61_pipeline_en.md new file mode 100644 index 00000000000000..dfa9abe67f34f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_epoch_61_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_61_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_61_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_61_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_61_pipeline_en_5.5.0_3.0_1726778606464.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_61_pipeline_en_5.5.0_3.0_1726778606464.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_61_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_61_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_61_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_61 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_finetuned_ring_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_finetuned_ring_en.md new file mode 100644 index 00000000000000..e24883e9d7c964 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_finetuned_ring_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_ring RoBertaForSequenceClassification from Cournane +author: John Snow Labs +name: roberta_base_finetuned_ring +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_ring` is a English model originally trained by Cournane. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_ring_en_5.5.0_3.0_1726750737389.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_ring_en_5.5.0_3.0_1726750737389.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_finetuned_ring","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_finetuned_ring", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_ring| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|427.2 MB| + +## References + +https://huggingface.co/Cournane/roberta-base-finetuned-Ring \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_finetuned_wallisian_whisper_1ep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_finetuned_wallisian_whisper_1ep_pipeline_en.md new file mode 100644 index 00000000000000..e7635f24b53af7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_finetuned_wallisian_whisper_1ep_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_whisper_1ep_pipeline pipeline RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_whisper_1ep_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_whisper_1ep_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_whisper_1ep_pipeline_en_5.5.0_3.0_1726747716215.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_whisper_1ep_pipeline_en_5.5.0_3.0_1726747716215.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_wallisian_whisper_1ep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_wallisian_whisper_1ep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_whisper_1ep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.3 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-whisper-1ep + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_ontonotes_pitangent_ds_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_ontonotes_pitangent_ds_pipeline_en.md new file mode 100644 index 00000000000000..80d1e3f360d278 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_ontonotes_pitangent_ds_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_ontonotes_pitangent_ds_pipeline pipeline RoBertaForTokenClassification from pitangent-ds +author: John Snow Labs +name: roberta_base_ontonotes_pitangent_ds_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ontonotes_pitangent_ds_pipeline` is a English model originally trained by pitangent-ds. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ontonotes_pitangent_ds_pipeline_en_5.5.0_3.0_1726729677840.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ontonotes_pitangent_ds_pipeline_en_5.5.0_3.0_1726729677840.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_ontonotes_pitangent_ds_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_ontonotes_pitangent_ds_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ontonotes_pitangent_ds_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|449.7 MB| + +## References + +https://huggingface.co/pitangent-ds/roberta-base-ontonotes + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_base_xshubhamx_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_xshubhamx_pipeline_en.md new file mode 100644 index 00000000000000..7298291b3bd299 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_base_xshubhamx_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_xshubhamx_pipeline pipeline RoBertaForSequenceClassification from xshubhamx +author: John Snow Labs +name: roberta_base_xshubhamx_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_xshubhamx_pipeline` is a English model originally trained by xshubhamx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_xshubhamx_pipeline_en_5.5.0_3.0_1726725779379.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_xshubhamx_pipeline_en_5.5.0_3.0_1726725779379.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_xshubhamx_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_xshubhamx_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_xshubhamx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|442.5 MB| + +## References + +https://huggingface.co/xshubhamx/roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_large_iterater_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_iterater_classifier_en.md new file mode 100644 index 00000000000000..deaaa4ba9dd52a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_iterater_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_iterater_classifier RoBertaForSequenceClassification from owanr +author: John Snow Labs +name: roberta_large_iterater_classifier +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_iterater_classifier` is a English model originally trained by owanr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_iterater_classifier_en_5.5.0_3.0_1726751278729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_iterater_classifier_en_5.5.0_3.0_1726751278729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_iterater_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_iterater_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_iterater_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|845.1 MB| + +## References + +https://huggingface.co/owanr/roberta_large_iterater_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_large_merged_subtaskb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_merged_subtaskb_pipeline_en.md new file mode 100644 index 00000000000000..d875220bb56513 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_merged_subtaskb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_merged_subtaskb_pipeline pipeline RoBertaForSequenceClassification from Sansh2003 +author: John Snow Labs +name: roberta_large_merged_subtaskb_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_merged_subtaskb_pipeline` is a English model originally trained by Sansh2003. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_merged_subtaskb_pipeline_en_5.5.0_3.0_1726726549749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_merged_subtaskb_pipeline_en_5.5.0_3.0_1726726549749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_merged_subtaskb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_merged_subtaskb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_merged_subtaskb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Sansh2003/roberta-large-merged-subtaskB + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_large_ppt_occitan_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_ppt_occitan_en.md new file mode 100644 index 00000000000000..2a2ec984678f2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_ppt_occitan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_ppt_occitan RoBertaEmbeddings from mehrshadk +author: John Snow Labs +name: roberta_large_ppt_occitan +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_ppt_occitan` is a English model originally trained by mehrshadk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_ppt_occitan_en_5.5.0_3.0_1726749573508.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_ppt_occitan_en_5.5.0_3.0_1726749573508.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_large_ppt_occitan","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_large_ppt_occitan","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_ppt_occitan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/mehrshadk/roberta_Large_ppt_OC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_large_sentiment_sst5_mapped_grouped_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_sentiment_sst5_mapped_grouped_0_pipeline_en.md new file mode 100644 index 00000000000000..4915ff473ba31f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_large_sentiment_sst5_mapped_grouped_0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_sentiment_sst5_mapped_grouped_0_pipeline pipeline RoBertaForSequenceClassification from kohankhaki +author: John Snow Labs +name: roberta_large_sentiment_sst5_mapped_grouped_0_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_sentiment_sst5_mapped_grouped_0_pipeline` is a English model originally trained by kohankhaki. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_sentiment_sst5_mapped_grouped_0_pipeline_en_5.5.0_3.0_1726733372453.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_sentiment_sst5_mapped_grouped_0_pipeline_en_5.5.0_3.0_1726733372453.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_sentiment_sst5_mapped_grouped_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_sentiment_sst5_mapped_grouped_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_sentiment_sst5_mapped_grouped_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/kohankhaki/roberta-large-sentiment-sst5-mapped-grouped-0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_persian_farsi_zwnj_base_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_persian_farsi_zwnj_base_v1_pipeline_en.md new file mode 100644 index 00000000000000..90a21af755309e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_persian_farsi_zwnj_base_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_persian_farsi_zwnj_base_v1_pipeline pipeline RoBertaForSequenceClassification from soltaniali +author: John Snow Labs +name: roberta_persian_farsi_zwnj_base_v1_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_persian_farsi_zwnj_base_v1_pipeline` is a English model originally trained by soltaniali. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_persian_farsi_zwnj_base_v1_pipeline_en_5.5.0_3.0_1726733368861.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_persian_farsi_zwnj_base_v1_pipeline_en_5.5.0_3.0_1726733368861.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_persian_farsi_zwnj_base_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_persian_farsi_zwnj_base_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_persian_farsi_zwnj_base_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|444.3 MB| + +## References + +https://huggingface.co/soltaniali/roberta-fa-zwnj-base_v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-robertabase_subjectivity_1_actual_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-robertabase_subjectivity_1_actual_pipeline_en.md new file mode 100644 index 00000000000000..6fa27bb3cfbb6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-robertabase_subjectivity_1_actual_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robertabase_subjectivity_1_actual_pipeline pipeline RoBertaForSequenceClassification from Muffins987 +author: John Snow Labs +name: robertabase_subjectivity_1_actual_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertabase_subjectivity_1_actual_pipeline` is a English model originally trained by Muffins987. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertabase_subjectivity_1_actual_pipeline_en_5.5.0_3.0_1726780236224.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertabase_subjectivity_1_actual_pipeline_en_5.5.0_3.0_1726780236224.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robertabase_subjectivity_1_actual_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robertabase_subjectivity_1_actual_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertabase_subjectivity_1_actual_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.5 MB| + +## References + +https://huggingface.co/Muffins987/robertabase-subjectivity-1-actual + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-rotten_tomatoes_roberta_base_seed_2_en.md b/docs/_posts/ahmedlone127/2024-09-19-rotten_tomatoes_roberta_base_seed_2_en.md new file mode 100644 index 00000000000000..45766e096787d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-rotten_tomatoes_roberta_base_seed_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English rotten_tomatoes_roberta_base_seed_2 RoBertaForSequenceClassification from utahnlp +author: John Snow Labs +name: rotten_tomatoes_roberta_base_seed_2 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rotten_tomatoes_roberta_base_seed_2` is a English model originally trained by utahnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rotten_tomatoes_roberta_base_seed_2_en_5.5.0_3.0_1726750668565.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rotten_tomatoes_roberta_base_seed_2_en_5.5.0_3.0_1726750668565.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("rotten_tomatoes_roberta_base_seed_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("rotten_tomatoes_roberta_base_seed_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rotten_tomatoes_roberta_base_seed_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|435.9 MB| + +## References + +https://huggingface.co/utahnlp/rotten_tomatoes_roberta-base_seed-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-ruperta_base_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-19-ruperta_base_pipeline_es.md new file mode 100644 index 00000000000000..3ddae9ef938527 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-ruperta_base_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish ruperta_base_pipeline pipeline RoBertaEmbeddings from mrm8488 +author: John Snow Labs +name: ruperta_base_pipeline +date: 2024-09-19 +tags: [es, open_source, pipeline, onnx] +task: Embeddings +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ruperta_base_pipeline` is a Castilian, Spanish model originally trained by mrm8488. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ruperta_base_pipeline_es_5.5.0_3.0_1726747293480.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ruperta_base_pipeline_es_5.5.0_3.0_1726747293480.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ruperta_base_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ruperta_base_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ruperta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|470.0 MB| + +## References + +https://huggingface.co/mrm8488/RuPERTa-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sent_aethiqs_gembert_bertje_50k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-sent_aethiqs_gembert_bertje_50k_pipeline_en.md new file mode 100644 index 00000000000000..9c2f6b13796840 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sent_aethiqs_gembert_bertje_50k_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_aethiqs_gembert_bertje_50k_pipeline pipeline BertSentenceEmbeddings from AethiQs-Max +author: John Snow Labs +name: sent_aethiqs_gembert_bertje_50k_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_aethiqs_gembert_bertje_50k_pipeline` is a English model originally trained by AethiQs-Max. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_aethiqs_gembert_bertje_50k_pipeline_en_5.5.0_3.0_1726782842624.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_aethiqs_gembert_bertje_50k_pipeline_en_5.5.0_3.0_1726782842624.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_aethiqs_gembert_bertje_50k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_aethiqs_gembert_bertje_50k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_aethiqs_gembert_bertje_50k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.3 MB| + +## References + +https://huggingface.co/AethiQs-Max/AethiQs_GemBERT_bertje_50k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sent_nepnewsbert_en.md b/docs/_posts/ahmedlone127/2024-09-19-sent_nepnewsbert_en.md new file mode 100644 index 00000000000000..7468f6d6761274 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sent_nepnewsbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_nepnewsbert BertSentenceEmbeddings from Shushant +author: John Snow Labs +name: sent_nepnewsbert +date: 2024-09-19 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_nepnewsbert` is a English model originally trained by Shushant. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_nepnewsbert_en_5.5.0_3.0_1726728652900.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_nepnewsbert_en_5.5.0_3.0_1726728652900.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_nepnewsbert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_nepnewsbert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_nepnewsbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|408.7 MB| + +## References + +https://huggingface.co/Shushant/NepNewsBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sentiment_multilingual_distilbert_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-19-sentiment_multilingual_distilbert_pipeline_xx.md new file mode 100644 index 00000000000000..972b56cfa0a536 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sentiment_multilingual_distilbert_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual sentiment_multilingual_distilbert_pipeline pipeline DistilBertForSequenceClassification from Mukalingam0813 +author: John Snow Labs +name: sentiment_multilingual_distilbert_pipeline +date: 2024-09-19 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_multilingual_distilbert_pipeline` is a Multilingual model originally trained by Mukalingam0813. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_multilingual_distilbert_pipeline_xx_5.5.0_3.0_1726741391537.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_multilingual_distilbert_pipeline_xx_5.5.0_3.0_1726741391537.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_multilingual_distilbert_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_multilingual_distilbert_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_multilingual_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|507.4 MB| + +## References + +https://huggingface.co/Mukalingam0813/sentiment-multilingual-distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-test7_balanced_and_sentence_en.md b/docs/_posts/ahmedlone127/2024-09-19-test7_balanced_and_sentence_en.md new file mode 100644 index 00000000000000..2a0a9f6ffc1cb8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-test7_balanced_and_sentence_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test7_balanced_and_sentence RoBertaForSequenceClassification from adriansanz +author: John Snow Labs +name: test7_balanced_and_sentence +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test7_balanced_and_sentence` is a English model originally trained by adriansanz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test7_balanced_and_sentence_en_5.5.0_3.0_1726750646809.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test7_balanced_and_sentence_en_5.5.0_3.0_1726750646809.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("test7_balanced_and_sentence","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("test7_balanced_and_sentence", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test7_balanced_and_sentence| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|465.4 MB| + +## References + +https://huggingface.co/adriansanz/test7_balanced_and_sentence \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-test_camsaid_en.md b/docs/_posts/ahmedlone127/2024-09-19-test_camsaid_en.md new file mode 100644 index 00000000000000..1eb7e0f9b7355e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-test_camsaid_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_camsaid DistilBertForSequenceClassification from CamSaid +author: John Snow Labs +name: test_camsaid +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_camsaid` is a English model originally trained by CamSaid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_camsaid_en_5.5.0_3.0_1726741230172.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_camsaid_en_5.5.0_3.0_1726741230172.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_camsaid","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_camsaid", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_camsaid| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/CamSaid/Test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-test_finetuned__roberta_base_biomedical_clinical_spanish__59k_ultrasounds_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-test_finetuned__roberta_base_biomedical_clinical_spanish__59k_ultrasounds_ner_pipeline_en.md new file mode 100644 index 00000000000000..1aa757b15b480d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-test_finetuned__roberta_base_biomedical_clinical_spanish__59k_ultrasounds_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_finetuned__roberta_base_biomedical_clinical_spanish__59k_ultrasounds_ner_pipeline pipeline RoBertaForTokenClassification from manucos +author: John Snow Labs +name: test_finetuned__roberta_base_biomedical_clinical_spanish__59k_ultrasounds_ner_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_finetuned__roberta_base_biomedical_clinical_spanish__59k_ultrasounds_ner_pipeline` is a English model originally trained by manucos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_finetuned__roberta_base_biomedical_clinical_spanish__59k_ultrasounds_ner_pipeline_en_5.5.0_3.0_1726730670572.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_finetuned__roberta_base_biomedical_clinical_spanish__59k_ultrasounds_ner_pipeline_en_5.5.0_3.0_1726730670572.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_finetuned__roberta_base_biomedical_clinical_spanish__59k_ultrasounds_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_finetuned__roberta_base_biomedical_clinical_spanish__59k_ultrasounds_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_finetuned__roberta_base_biomedical_clinical_spanish__59k_ultrasounds_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|469.5 MB| + +## References + +https://huggingface.co/manucos/test-finetuned__roberta-base-biomedical-clinical-es__59k-ultrasounds-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-test_harmmie_en.md b/docs/_posts/ahmedlone127/2024-09-19-test_harmmie_en.md new file mode 100644 index 00000000000000..41f14335fd7f0f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-test_harmmie_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_harmmie DistilBertForSequenceClassification from Harmmie +author: John Snow Labs +name: test_harmmie +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_harmmie` is a English model originally trained by Harmmie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_harmmie_en_5.5.0_3.0_1726763938500.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_harmmie_en_5.5.0_3.0_1726763938500.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_harmmie","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_harmmie", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_harmmie| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Harmmie/test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-testing_model_katowtkkk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-testing_model_katowtkkk_pipeline_en.md new file mode 100644 index 00000000000000..a081c6a110ffc3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-testing_model_katowtkkk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English testing_model_katowtkkk_pipeline pipeline DistilBertForSequenceClassification from katowtkkk +author: John Snow Labs +name: testing_model_katowtkkk_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`testing_model_katowtkkk_pipeline` is a English model originally trained by katowtkkk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/testing_model_katowtkkk_pipeline_en_5.5.0_3.0_1726740667789.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/testing_model_katowtkkk_pipeline_en_5.5.0_3.0_1726740667789.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("testing_model_katowtkkk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("testing_model_katowtkkk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|testing_model_katowtkkk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/katowtkkk/testing_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-trainedlocation_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-trainedlocation_pipeline_en.md new file mode 100644 index 00000000000000..6bad08921b3577 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-trainedlocation_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trainedlocation_pipeline pipeline DistilBertForSequenceClassification from PathofthePeople +author: John Snow Labs +name: trainedlocation_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trainedlocation_pipeline` is a English model originally trained by PathofthePeople. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trainedlocation_pipeline_en_5.5.0_3.0_1726743874102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trainedlocation_pipeline_en_5.5.0_3.0_1726743874102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trainedlocation_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trainedlocation_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trainedlocation_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/PathofthePeople/TrainedLocation + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-trainer_chapter5_en.md b/docs/_posts/ahmedlone127/2024-09-19-trainer_chapter5_en.md new file mode 100644 index 00000000000000..c5e8cd9a4dac73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-trainer_chapter5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trainer_chapter5 DistilBertForSequenceClassification from AlanHou +author: John Snow Labs +name: trainer_chapter5 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trainer_chapter5` is a English model originally trained by AlanHou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trainer_chapter5_en_5.5.0_3.0_1726743424391.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trainer_chapter5_en_5.5.0_3.0_1726743424391.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("trainer_chapter5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("trainer_chapter5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trainer_chapter5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AlanHou/trainer-chapter5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-trainer_chapter5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-trainer_chapter5_pipeline_en.md new file mode 100644 index 00000000000000..90fbbc5a87f63a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-trainer_chapter5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trainer_chapter5_pipeline pipeline DistilBertForSequenceClassification from AlanHou +author: John Snow Labs +name: trainer_chapter5_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trainer_chapter5_pipeline` is a English model originally trained by AlanHou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trainer_chapter5_pipeline_en_5.5.0_3.0_1726743440550.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trainer_chapter5_pipeline_en_5.5.0_3.0_1726743440550.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trainer_chapter5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trainer_chapter5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trainer_chapter5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AlanHou/trainer-chapter5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_big_kmon_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_big_kmon_pipeline_en.md new file mode 100644 index 00000000000000..249947ac0941a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_big_kmon_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_big_kmon_pipeline pipeline WhisperForCTC from susmitabhatt +author: John Snow Labs +name: whisper_big_kmon_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_big_kmon_pipeline` is a English model originally trained by susmitabhatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_big_kmon_pipeline_en_5.5.0_3.0_1726757913058.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_big_kmon_pipeline_en_5.5.0_3.0_1726757913058.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_big_kmon_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_big_kmon_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_big_kmon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/susmitabhatt/whisper-big-kmon + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_romanian_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_romanian_en.md new file mode 100644 index 00000000000000..856356281fb17b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_romanian_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_romanian WhisperForCTC from readerbench +author: John Snow Labs +name: whisper_romanian +date: 2024-09-19 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_romanian` is a English model originally trained by readerbench. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_romanian_en_5.5.0_3.0_1726758952893.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_romanian_en_5.5.0_3.0_1726758952893.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_romanian","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_romanian", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_romanian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/readerbench/whisper-ro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_slovak_small_augmented_pipeline_sk.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_slovak_small_augmented_pipeline_sk.md new file mode 100644 index 00000000000000..1e9d7fefcde8d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_slovak_small_augmented_pipeline_sk.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Slovak whisper_slovak_small_augmented_pipeline pipeline WhisperForCTC from ALM +author: John Snow Labs +name: whisper_slovak_small_augmented_pipeline +date: 2024-09-19 +tags: [sk, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: sk +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_slovak_small_augmented_pipeline` is a Slovak model originally trained by ALM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_slovak_small_augmented_pipeline_sk_5.5.0_3.0_1726787879617.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_slovak_small_augmented_pipeline_sk_5.5.0_3.0_1726787879617.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_slovak_small_augmented_pipeline", lang = "sk") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_slovak_small_augmented_pipeline", lang = "sk") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_slovak_small_augmented_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sk| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ALM/whisper-sk-small-augmented + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_small_arabic_mohammadjamalaldeen_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_arabic_mohammadjamalaldeen_pipeline_ar.md new file mode 100644 index 00000000000000..1d7ca906093f93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_arabic_mohammadjamalaldeen_pipeline_ar.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Arabic whisper_small_arabic_mohammadjamalaldeen_pipeline pipeline WhisperForCTC from MohammadJamalaldeen +author: John Snow Labs +name: whisper_small_arabic_mohammadjamalaldeen_pipeline +date: 2024-09-19 +tags: [ar, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_mohammadjamalaldeen_pipeline` is a Arabic model originally trained by MohammadJamalaldeen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_mohammadjamalaldeen_pipeline_ar_5.5.0_3.0_1726714695319.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_mohammadjamalaldeen_pipeline_ar_5.5.0_3.0_1726714695319.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_arabic_mohammadjamalaldeen_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_arabic_mohammadjamalaldeen_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_mohammadjamalaldeen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/MohammadJamalaldeen/whisper-small-arabic + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_small_arabic_noise2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_arabic_noise2_pipeline_en.md new file mode 100644 index 00000000000000..e2b7caca3c1763 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_arabic_noise2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_arabic_noise2_pipeline pipeline WhisperForCTC from MohammedNasri +author: John Snow Labs +name: whisper_small_arabic_noise2_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_noise2_pipeline` is a English model originally trained by MohammedNasri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_noise2_pipeline_en_5.5.0_3.0_1726788665983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_noise2_pipeline_en_5.5.0_3.0_1726788665983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_arabic_noise2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_arabic_noise2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_noise2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/MohammedNasri/whisper-small-ar-Noise2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_small_divehi_jackoyoungblood_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_divehi_jackoyoungblood_en.md new file mode 100644 index 00000000000000..07aa1c1b49817a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_divehi_jackoyoungblood_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_divehi_jackoyoungblood WhisperForCTC from jackoyoungblood +author: John Snow Labs +name: whisper_small_divehi_jackoyoungblood +date: 2024-09-19 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_jackoyoungblood` is a English model originally trained by jackoyoungblood. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_jackoyoungblood_en_5.5.0_3.0_1726759883402.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_jackoyoungblood_en_5.5.0_3.0_1726759883402.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_divehi_jackoyoungblood","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_divehi_jackoyoungblood", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_jackoyoungblood| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/jackoyoungblood/whisper-small-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_small_icelandic_is.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_icelandic_is.md new file mode 100644 index 00000000000000..8071e50457b21f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_small_icelandic_is.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Icelandic whisper_small_icelandic WhisperForCTC from Valdimarb13 +author: John Snow Labs +name: whisper_small_icelandic +date: 2024-09-19 +tags: [is, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: is +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_icelandic` is a Icelandic model originally trained by Valdimarb13. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_icelandic_is_5.5.0_3.0_1726758164834.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_icelandic_is_5.5.0_3.0_1726758164834.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_icelandic","is") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_icelandic", "is") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_icelandic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|is| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Valdimarb13/whisper-small-icelandic \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_english_minds14_magnustragardh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_english_minds14_magnustragardh_pipeline_en.md new file mode 100644 index 00000000000000..73008df602f64a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_english_minds14_magnustragardh_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_english_minds14_magnustragardh_pipeline pipeline WhisperForCTC from magnustragardh +author: John Snow Labs +name: whisper_tiny_english_minds14_magnustragardh_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_minds14_magnustragardh_pipeline` is a English model originally trained by magnustragardh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_minds14_magnustragardh_pipeline_en_5.5.0_3.0_1726789018902.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_minds14_magnustragardh_pipeline_en_5.5.0_3.0_1726789018902.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_english_minds14_magnustragardh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_english_minds14_magnustragardh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_minds14_magnustragardh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/magnustragardh/whisper-tiny-en-minds14 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_indonesian_cahya_id.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_indonesian_cahya_id.md new file mode 100644 index 00000000000000..674670158389b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_indonesian_cahya_id.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Indonesian whisper_tiny_indonesian_cahya WhisperForCTC from cahya +author: John Snow Labs +name: whisper_tiny_indonesian_cahya +date: 2024-09-19 +tags: [id, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_indonesian_cahya` is a Indonesian model originally trained by cahya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_indonesian_cahya_id_5.5.0_3.0_1726712665653.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_indonesian_cahya_id_5.5.0_3.0_1726712665653.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_indonesian_cahya","id") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_indonesian_cahya", "id") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_indonesian_cahya| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|id| +|Size:|389.8 MB| + +## References + +https://huggingface.co/cahya/whisper-tiny-id \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_serbian_combined_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_serbian_combined_pipeline_en.md new file mode 100644 index 00000000000000..e91097c9923938 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_serbian_combined_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_serbian_combined_pipeline pipeline WhisperForCTC from cminja +author: John Snow Labs +name: whisper_tiny_serbian_combined_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_serbian_combined_pipeline` is a English model originally trained by cminja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_serbian_combined_pipeline_en_5.5.0_3.0_1726788912059.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_serbian_combined_pipeline_en_5.5.0_3.0_1726788912059.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_serbian_combined_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_serbian_combined_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_serbian_combined_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.0 MB| + +## References + +https://huggingface.co/cminja/whisper-tiny-sr-combined + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_english_hhffxx_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_english_hhffxx_pipeline_en.md new file mode 100644 index 00000000000000..39dca5ddcf8102 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_english_hhffxx_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_hhffxx_pipeline pipeline XlmRoBertaForTokenClassification from hhffxx +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_hhffxx_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_hhffxx_pipeline` is a English model originally trained by hhffxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_hhffxx_pipeline_en_5.5.0_3.0_1726708689902.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_hhffxx_pipeline_en_5.5.0_3.0_1726708689902.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_hhffxx_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_hhffxx_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_hhffxx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|837.2 MB| + +## References + +https://huggingface.co/hhffxx/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_french_occupy1_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_french_occupy1_en.md new file mode 100644 index 00000000000000..e74977de7ee78f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_french_occupy1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_occupy1 XlmRoBertaForTokenClassification from occupy1 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_occupy1 +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_occupy1` is a English model originally trained by occupy1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_occupy1_en_5.5.0_3.0_1726754400059.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_occupy1_en_5.5.0_3.0_1726754400059.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_occupy1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_occupy1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_occupy1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/occupy1/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_french_pockypocky_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_french_pockypocky_pipeline_en.md new file mode 100644 index 00000000000000..65de7ceba47b78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_french_pockypocky_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_pockypocky_pipeline pipeline XlmRoBertaForTokenClassification from pockypocky +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_pockypocky_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_pockypocky_pipeline` is a English model originally trained by pockypocky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_pockypocky_pipeline_en_5.5.0_3.0_1726708900954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_pockypocky_pipeline_en_5.5.0_3.0_1726708900954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_pockypocky_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_pockypocky_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_pockypocky_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/pockypocky/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_princedl_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_princedl_pipeline_en.md new file mode 100644 index 00000000000000..d227d809a9cffa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_princedl_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_princedl_pipeline pipeline XlmRoBertaForTokenClassification from princedl +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_princedl_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_princedl_pipeline` is a English model originally trained by princedl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_princedl_pipeline_en_5.5.0_3.0_1726711488347.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_princedl_pipeline_en_5.5.0_3.0_1726711488347.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_princedl_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_princedl_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_princedl_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|844.5 MB| + +## References + +https://huggingface.co/princedl/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_italian_ashrielbrian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_italian_ashrielbrian_pipeline_en.md new file mode 100644 index 00000000000000..7c8c7baa94c3c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_italian_ashrielbrian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_ashrielbrian_pipeline pipeline XlmRoBertaForTokenClassification from ashrielbrian +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_ashrielbrian_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_ashrielbrian_pipeline` is a English model originally trained by ashrielbrian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_ashrielbrian_pipeline_en_5.5.0_3.0_1726708506081.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_ashrielbrian_pipeline_en_5.5.0_3.0_1726708506081.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_ashrielbrian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_ashrielbrian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_ashrielbrian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/ashrielbrian/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_italian_xrchen11_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_italian_xrchen11_en.md new file mode 100644 index 00000000000000..a745a10fe613cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_italian_xrchen11_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_xrchen11 XlmRoBertaForTokenClassification from xrchen11 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_xrchen11 +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_xrchen11` is a English model originally trained by xrchen11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_xrchen11_en_5.5.0_3.0_1726753793660.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_xrchen11_en_5.5.0_3.0_1726753793660.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_xrchen11","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_xrchen11", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_xrchen11| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|816.7 MB| + +## References + +https://huggingface.co/xrchen11/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_lr0_0001_seed42_basic_original_amh_esp_eng_train_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_lr0_0001_seed42_basic_original_amh_esp_eng_train_pipeline_en.md new file mode 100644 index 00000000000000..fb998ce4c03384 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_lr0_0001_seed42_basic_original_amh_esp_eng_train_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_lr0_0001_seed42_basic_original_amh_esp_eng_train_pipeline pipeline XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr0_0001_seed42_basic_original_amh_esp_eng_train_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr0_0001_seed42_basic_original_amh_esp_eng_train_pipeline` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_0001_seed42_basic_original_amh_esp_eng_train_pipeline_en_5.5.0_3.0_1726752815153.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_0001_seed42_basic_original_amh_esp_eng_train_pipeline_en_5.5.0_3.0_1726752815153.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_lr0_0001_seed42_basic_original_amh_esp_eng_train_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_lr0_0001_seed42_basic_original_amh_esp_eng_train_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr0_0001_seed42_basic_original_amh_esp_eng_train_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|821.8 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr0.0001_seed42_basic_original_amh-esp-eng_train + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-20240304_01_hf_en.md b/docs/_posts/ahmedlone127/2024-09-20-20240304_01_hf_en.md new file mode 100644 index 00000000000000..b493083792b442 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-20240304_01_hf_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 20240304_01_hf DistilBertForSequenceClassification from linweichen +author: John Snow Labs +name: 20240304_01_hf +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`20240304_01_hf` is a English model originally trained by linweichen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/20240304_01_hf_en_5.5.0_3.0_1726841426416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/20240304_01_hf_en_5.5.0_3.0_1726841426416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("20240304_01_hf","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("20240304_01_hf", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|20240304_01_hf| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/linweichen/20240304_01_HF \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-27th2024dash1_distilbert_base_uncased_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-27th2024dash1_distilbert_base_uncased_2_pipeline_en.md new file mode 100644 index 00000000000000..7f261fd1e9b260 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-27th2024dash1_distilbert_base_uncased_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 27th2024dash1_distilbert_base_uncased_2_pipeline pipeline DistilBertForSequenceClassification from blockchain17171 +author: John Snow Labs +name: 27th2024dash1_distilbert_base_uncased_2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`27th2024dash1_distilbert_base_uncased_2_pipeline` is a English model originally trained by blockchain17171. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/27th2024dash1_distilbert_base_uncased_2_pipeline_en_5.5.0_3.0_1726809315447.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/27th2024dash1_distilbert_base_uncased_2_pipeline_en_5.5.0_3.0_1726809315447.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("27th2024dash1_distilbert_base_uncased_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("27th2024dash1_distilbert_base_uncased_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|27th2024dash1_distilbert_base_uncased_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/blockchain17171/27th2024DASH1-distilbert-base-uncased-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-4_datasets_fake_news_with_covid_balanced_others_norwegian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-4_datasets_fake_news_with_covid_balanced_others_norwegian_pipeline_en.md new file mode 100644 index 00000000000000..4218a40c44d8b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-4_datasets_fake_news_with_covid_balanced_others_norwegian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 4_datasets_fake_news_with_covid_balanced_others_norwegian_pipeline pipeline DistilBertForSequenceClassification from littlepinhorse +author: John Snow Labs +name: 4_datasets_fake_news_with_covid_balanced_others_norwegian_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`4_datasets_fake_news_with_covid_balanced_others_norwegian_pipeline` is a English model originally trained by littlepinhorse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/4_datasets_fake_news_with_covid_balanced_others_norwegian_pipeline_en_5.5.0_3.0_1726792035678.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/4_datasets_fake_news_with_covid_balanced_others_norwegian_pipeline_en_5.5.0_3.0_1726792035678.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("4_datasets_fake_news_with_covid_balanced_others_norwegian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("4_datasets_fake_news_with_covid_balanced_others_norwegian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|4_datasets_fake_news_with_covid_balanced_others_norwegian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/littlepinhorse/4_datasets_fake_news_with_covid_balanced_others_no + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-affilgood_ner_test_v4_en.md b/docs/_posts/ahmedlone127/2024-09-20-affilgood_ner_test_v4_en.md new file mode 100644 index 00000000000000..422fd8c96f68df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-affilgood_ner_test_v4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English affilgood_ner_test_v4 RoBertaForTokenClassification from nicolauduran45 +author: John Snow Labs +name: affilgood_ner_test_v4 +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`affilgood_ner_test_v4` is a English model originally trained by nicolauduran45. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/affilgood_ner_test_v4_en_5.5.0_3.0_1726846859587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/affilgood_ner_test_v4_en_5.5.0_3.0_1726846859587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("affilgood_ner_test_v4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("affilgood_ner_test_v4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|affilgood_ner_test_v4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/nicolauduran45/affilgood-ner-test-v4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-affilgood_ner_test_v4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-affilgood_ner_test_v4_pipeline_en.md new file mode 100644 index 00000000000000..e26842a6f5ab54 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-affilgood_ner_test_v4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English affilgood_ner_test_v4_pipeline pipeline RoBertaForTokenClassification from nicolauduran45 +author: John Snow Labs +name: affilgood_ner_test_v4_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`affilgood_ner_test_v4_pipeline` is a English model originally trained by nicolauduran45. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/affilgood_ner_test_v4_pipeline_en_5.5.0_3.0_1726846882261.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/affilgood_ner_test_v4_pipeline_en_5.5.0_3.0_1726846882261.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("affilgood_ner_test_v4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("affilgood_ner_test_v4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|affilgood_ner_test_v4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/nicolauduran45/affilgood-ner-test-v4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_auto_and_commute_16_16_5_oos_en.md b/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_auto_and_commute_16_16_5_oos_en.md new file mode 100644 index 00000000000000..c77825cbadfb5e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_auto_and_commute_16_16_5_oos_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_auto_and_commute_16_16_5_oos RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_auto_and_commute_16_16_5_oos +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_auto_and_commute_16_16_5_oos` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_auto_and_commute_16_16_5_oos_en_5.5.0_3.0_1726805469779.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_auto_and_commute_16_16_5_oos_en_5.5.0_3.0_1726805469779.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_auto_and_commute_16_16_5_oos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_auto_and_commute_16_16_5_oos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_auto_and_commute_16_16_5_oos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-auto_and_commute-16-16-5-oos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-autotrain_63go1_k0lzp_en.md b/docs/_posts/ahmedlone127/2024-09-20-autotrain_63go1_k0lzp_en.md new file mode 100644 index 00000000000000..45303ba656dfb9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-autotrain_63go1_k0lzp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English autotrain_63go1_k0lzp DistilBertForSequenceClassification from ben-yu +author: John Snow Labs +name: autotrain_63go1_k0lzp +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_63go1_k0lzp` is a English model originally trained by ben-yu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_63go1_k0lzp_en_5.5.0_3.0_1726860726102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_63go1_k0lzp_en_5.5.0_3.0_1726860726102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("autotrain_63go1_k0lzp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("autotrain_63go1_k0lzp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_63go1_k0lzp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ben-yu/autotrain-63go1-k0lzp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-autotrain_v2v7o_9tu3d_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-autotrain_v2v7o_9tu3d_pipeline_en.md new file mode 100644 index 00000000000000..9a053ff3947261 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-autotrain_v2v7o_9tu3d_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autotrain_v2v7o_9tu3d_pipeline pipeline DistilBertForSequenceClassification from cuwfnguyen +author: John Snow Labs +name: autotrain_v2v7o_9tu3d_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_v2v7o_9tu3d_pipeline` is a English model originally trained by cuwfnguyen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_v2v7o_9tu3d_pipeline_en_5.5.0_3.0_1726792387685.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_v2v7o_9tu3d_pipeline_en_5.5.0_3.0_1726792387685.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_v2v7o_9tu3d_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_v2v7o_9tu3d_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_v2v7o_9tu3d_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/cuwfnguyen/autotrain-v2v7o-9tu3d + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bbc_en.md b/docs/_posts/ahmedlone127/2024-09-20-bbc_en.md new file mode 100644 index 00000000000000..130a5157b5e75d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bbc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bbc DistilBertForSequenceClassification from NawinCom +author: John Snow Labs +name: bbc +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bbc` is a English model originally trained by NawinCom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bbc_en_5.5.0_3.0_1726842175208.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bbc_en_5.5.0_3.0_1726842175208.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bbc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bbc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bbc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/NawinCom/BBC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_klue_mrc_finetuned_jihoonkimharu_pipeline_ko.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_klue_mrc_finetuned_jihoonkimharu_pipeline_ko.md new file mode 100644 index 00000000000000..43b8b1ba122caf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_klue_mrc_finetuned_jihoonkimharu_pipeline_ko.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Korean bert_base_klue_mrc_finetuned_jihoonkimharu_pipeline pipeline BertForQuestionAnswering from jihoonkimharu +author: John Snow Labs +name: bert_base_klue_mrc_finetuned_jihoonkimharu_pipeline +date: 2024-09-20 +tags: [ko, open_source, pipeline, onnx] +task: Question Answering +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_klue_mrc_finetuned_jihoonkimharu_pipeline` is a Korean model originally trained by jihoonkimharu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_klue_mrc_finetuned_jihoonkimharu_pipeline_ko_5.5.0_3.0_1726820662656.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_klue_mrc_finetuned_jihoonkimharu_pipeline_ko_5.5.0_3.0_1726820662656.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_klue_mrc_finetuned_jihoonkimharu_pipeline", lang = "ko") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_klue_mrc_finetuned_jihoonkimharu_pipeline", lang = "ko") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_klue_mrc_finetuned_jihoonkimharu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|412.4 MB| + +## References + +https://huggingface.co/jihoonkimharu/bert-base-klue-mrc-finetuned + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_portuguese_cased_tiagosanti_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_portuguese_cased_tiagosanti_pipeline_en.md new file mode 100644 index 00000000000000..fb25cab68f16ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_portuguese_cased_tiagosanti_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_portuguese_cased_tiagosanti_pipeline pipeline BertForSequenceClassification from TiagoSanti +author: John Snow Labs +name: bert_base_portuguese_cased_tiagosanti_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_portuguese_cased_tiagosanti_pipeline` is a English model originally trained by TiagoSanti. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_cased_tiagosanti_pipeline_en_5.5.0_3.0_1726859860251.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_cased_tiagosanti_pipeline_en_5.5.0_3.0_1726859860251.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_portuguese_cased_tiagosanti_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_portuguese_cased_tiagosanti_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_portuguese_cased_tiagosanti_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/TiagoSanti/bert-base-portuguese-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_spanish_wwm_cased_finetuned_qa_sqac_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_spanish_wwm_cased_finetuned_qa_sqac_pipeline_en.md new file mode 100644 index 00000000000000..e2f1c501429aca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_spanish_wwm_cased_finetuned_qa_sqac_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_spanish_wwm_cased_finetuned_qa_sqac_pipeline pipeline BertForQuestionAnswering from dccuchile +author: John Snow Labs +name: bert_base_spanish_wwm_cased_finetuned_qa_sqac_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_cased_finetuned_qa_sqac_pipeline` is a English model originally trained by dccuchile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_finetuned_qa_sqac_pipeline_en_5.5.0_3.0_1726833687506.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_finetuned_qa_sqac_pipeline_en_5.5.0_3.0_1726833687506.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_spanish_wwm_cased_finetuned_qa_sqac_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_spanish_wwm_cased_finetuned_qa_sqac_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_cased_finetuned_qa_sqac_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased-finetuned-qa-sqac + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_pipeline_en.md new file mode 100644 index 00000000000000..0934768f835a3d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_pipeline_en_5.5.0_3.0_1726833715719.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_pipeline_en_5.5.0_3.0_1726833715719.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_5_swati_0_southern_sotho_true_fh_true_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-06-wd-0.001-dp-0.5-ss-0-st-True-fh-True + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetuned_set_3_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetuned_set_3_en.md new file mode 100644 index 00000000000000..d9cc8409103b00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetuned_set_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_set_3 BertForSequenceClassification from joetey +author: John Snow Labs +name: bert_base_uncased_finetuned_set_3 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_set_3` is a English model originally trained by joetey. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_set_3_en_5.5.0_3.0_1726797266587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_set_3_en_5.5.0_3.0_1726797266587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_set_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_set_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_set_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/joetey/bert-base-uncased-finetuned-set_3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_qqp_modeltc_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_qqp_modeltc_en.md new file mode 100644 index 00000000000000..156b09749c908b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_qqp_modeltc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_qqp_modeltc BertForSequenceClassification from ModelTC +author: John Snow Labs +name: bert_base_uncased_qqp_modeltc +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_qqp_modeltc` is a English model originally trained by ModelTC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qqp_modeltc_en_5.5.0_3.0_1726828569797.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qqp_modeltc_en_5.5.0_3.0_1726828569797.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_qqp_modeltc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_qqp_modeltc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_qqp_modeltc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ModelTC/bert-base-uncased-qqp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_vitamin_c_fact_verification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_vitamin_c_fact_verification_pipeline_en.md new file mode 100644 index 00000000000000..770f06365c1289 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_vitamin_c_fact_verification_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_vitamin_c_fact_verification_pipeline pipeline BertForQuestionAnswering from DunnBC22 +author: John Snow Labs +name: bert_base_uncased_vitamin_c_fact_verification_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_vitamin_c_fact_verification_pipeline` is a English model originally trained by DunnBC22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_vitamin_c_fact_verification_pipeline_en_5.5.0_3.0_1726820777448.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_vitamin_c_fact_verification_pipeline_en_5.5.0_3.0_1726820777448.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_vitamin_c_fact_verification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_vitamin_c_fact_verification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_vitamin_c_fact_verification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/DunnBC22/bert-base-uncased-Vitamin_C_Fact_Verification + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_fine_tuned_cola_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_fine_tuned_cola_pipeline_en.md new file mode 100644 index 00000000000000..5095df5dbbce6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_fine_tuned_cola_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_fine_tuned_cola_pipeline pipeline DistilBertForSequenceClassification from erden00 +author: John Snow Labs +name: bert_fine_tuned_cola_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_fine_tuned_cola_pipeline` is a English model originally trained by erden00. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_fine_tuned_cola_pipeline_en_5.5.0_3.0_1726841564036.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_fine_tuned_cola_pipeline_en_5.5.0_3.0_1726841564036.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_fine_tuned_cola_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_fine_tuned_cola_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_fine_tuned_cola_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/erden00/bert-fine-tuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_fromscratch_galician_xlarge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_fromscratch_galician_xlarge_pipeline_en.md new file mode 100644 index 00000000000000..78a73217728a0f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_fromscratch_galician_xlarge_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_fromscratch_galician_xlarge_pipeline pipeline RoBertaEmbeddings from fpuentes +author: John Snow Labs +name: bert_fromscratch_galician_xlarge_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_fromscratch_galician_xlarge_pipeline` is a English model originally trained by fpuentes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_fromscratch_galician_xlarge_pipeline_en_5.5.0_3.0_1726793740355.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_fromscratch_galician_xlarge_pipeline_en_5.5.0_3.0_1726793740355.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_fromscratch_galician_xlarge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_fromscratch_galician_xlarge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_fromscratch_galician_xlarge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fpuentes/bert-fromscratch-galician-xlarge + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_jigsaw_severetoxic_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_jigsaw_severetoxic_en.md new file mode 100644 index 00000000000000..1a1f4dfe6425d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_jigsaw_severetoxic_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_jigsaw_severetoxic BertForSequenceClassification from Cameron +author: John Snow Labs +name: bert_jigsaw_severetoxic +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_jigsaw_severetoxic` is a English model originally trained by Cameron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_jigsaw_severetoxic_en_5.5.0_3.0_1726859936954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_jigsaw_severetoxic_en_5.5.0_3.0_1726859936954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_jigsaw_severetoxic","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_jigsaw_severetoxic", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_jigsaw_severetoxic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/Cameron/BERT-jigsaw-severetoxic \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_multilingual_uncased_intelligence_headlines_xx.md b/docs/_posts/ahmedlone127/2024-09-20-bert_multilingual_uncased_intelligence_headlines_xx.md new file mode 100644 index 00000000000000..f98e545bb77a1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_multilingual_uncased_intelligence_headlines_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_multilingual_uncased_intelligence_headlines BertForSequenceClassification from nlpodyssey +author: John Snow Labs +name: bert_multilingual_uncased_intelligence_headlines +date: 2024-09-20 +tags: [xx, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_multilingual_uncased_intelligence_headlines` is a Multilingual model originally trained by nlpodyssey. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_multilingual_uncased_intelligence_headlines_xx_5.5.0_3.0_1726795413707.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_multilingual_uncased_intelligence_headlines_xx_5.5.0_3.0_1726795413707.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_multilingual_uncased_intelligence_headlines","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_multilingual_uncased_intelligence_headlines", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_multilingual_uncased_intelligence_headlines| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|628.1 MB| + +## References + +https://huggingface.co/nlpodyssey/bert-multilingual-uncased-intelligence-headlines \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_sentiment_persian_farsi_rasooli3003_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_sentiment_persian_farsi_rasooli3003_pipeline_en.md new file mode 100644 index 00000000000000..9bd819703b7475 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_sentiment_persian_farsi_rasooli3003_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_sentiment_persian_farsi_rasooli3003_pipeline pipeline BertForSequenceClassification from Rasooli3003 +author: John Snow Labs +name: bert_sentiment_persian_farsi_rasooli3003_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sentiment_persian_farsi_rasooli3003_pipeline` is a English model originally trained by Rasooli3003. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sentiment_persian_farsi_rasooli3003_pipeline_en_5.5.0_3.0_1726829039256.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sentiment_persian_farsi_rasooli3003_pipeline_en_5.5.0_3.0_1726829039256.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_sentiment_persian_farsi_rasooli3003_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_sentiment_persian_farsi_rasooli3003_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sentiment_persian_farsi_rasooli3003_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|608.8 MB| + +## References + +https://huggingface.co/Rasooli3003/Bert-Sentiment-Fa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bhavik_finetuning_sentiment_model_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bhavik_finetuning_sentiment_model_1_pipeline_en.md new file mode 100644 index 00000000000000..40e231209b8cb8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bhavik_finetuning_sentiment_model_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bhavik_finetuning_sentiment_model_1_pipeline pipeline DistilBertForSequenceClassification from bhavikardeshna +author: John Snow Labs +name: bhavik_finetuning_sentiment_model_1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bhavik_finetuning_sentiment_model_1_pipeline` is a English model originally trained by bhavikardeshna. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bhavik_finetuning_sentiment_model_1_pipeline_en_5.5.0_3.0_1726861015962.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bhavik_finetuning_sentiment_model_1_pipeline_en_5.5.0_3.0_1726861015962.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bhavik_finetuning_sentiment_model_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bhavik_finetuning_sentiment_model_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bhavik_finetuning_sentiment_model_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bhavikardeshna/bhavik-finetuning-sentiment-model-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bobo_emb_en.md b/docs/_posts/ahmedlone127/2024-09-20-bobo_emb_en.md new file mode 100644 index 00000000000000..aacfe6d7b2be8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bobo_emb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bobo_emb BertForSequenceClassification from Bobouo +author: John Snow Labs +name: bobo_emb +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bobo_emb` is a English model originally trained by Bobouo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bobo_emb_en_5.5.0_3.0_1726829231032.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bobo_emb_en_5.5.0_3.0_1726829231032.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bobo_emb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bobo_emb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bobo_emb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.9 MB| + +## References + +https://huggingface.co/Bobouo/Bobo_emb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bsc_bio_ehr_spanish_symptemist_fasttext_75_ner_en.md b/docs/_posts/ahmedlone127/2024-09-20-bsc_bio_ehr_spanish_symptemist_fasttext_75_ner_en.md new file mode 100644 index 00000000000000..be43401cbac69f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bsc_bio_ehr_spanish_symptemist_fasttext_75_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bsc_bio_ehr_spanish_symptemist_fasttext_75_ner RoBertaForTokenClassification from Rodrigo1771 +author: John Snow Labs +name: bsc_bio_ehr_spanish_symptemist_fasttext_75_ner +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_symptemist_fasttext_75_ner` is a English model originally trained by Rodrigo1771. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_fasttext_75_ner_en_5.5.0_3.0_1726847469915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_fasttext_75_ner_en_5.5.0_3.0_1726847469915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_symptemist_fasttext_75_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_symptemist_fasttext_75_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_symptemist_fasttext_75_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|435.7 MB| + +## References + +https://huggingface.co/Rodrigo1771/bsc-bio-ehr-es-symptemist-fasttext-75-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_akhil9514_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_akhil9514_pipeline_en.md new file mode 100644 index 00000000000000..8984ce9be8fcab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_akhil9514_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_akhil9514_pipeline pipeline DistilBertForSequenceClassification from Akhil9514 +author: John Snow Labs +name: burmese_awesome_model_akhil9514_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_akhil9514_pipeline` is a English model originally trained by Akhil9514. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_akhil9514_pipeline_en_5.5.0_3.0_1726832753670.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_akhil9514_pipeline_en_5.5.0_3.0_1726832753670.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_akhil9514_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_akhil9514_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_akhil9514_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Akhil9514/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_gamdalf_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_gamdalf_en.md new file mode 100644 index 00000000000000..c9336817ce0a41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_gamdalf_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_gamdalf DistilBertForSequenceClassification from Gamdalf +author: John Snow Labs +name: burmese_awesome_model_gamdalf +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_gamdalf` is a English model originally trained by Gamdalf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_gamdalf_en_5.5.0_3.0_1726842369443.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_gamdalf_en_5.5.0_3.0_1726842369443.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_gamdalf","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_gamdalf", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_gamdalf| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gamdalf/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_gamdalf_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_gamdalf_pipeline_en.md new file mode 100644 index 00000000000000..a73027e59d3af4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_gamdalf_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_gamdalf_pipeline pipeline DistilBertForSequenceClassification from Gamdalf +author: John Snow Labs +name: burmese_awesome_model_gamdalf_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_gamdalf_pipeline` is a English model originally trained by Gamdalf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_gamdalf_pipeline_en_5.5.0_3.0_1726842381442.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_gamdalf_pipeline_en_5.5.0_3.0_1726842381442.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_gamdalf_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_gamdalf_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_gamdalf_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gamdalf/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_priority_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_priority_3_pipeline_en.md new file mode 100644 index 00000000000000..9c4b7c77bb86fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_priority_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_priority_3_pipeline pipeline DistilBertForSequenceClassification from lenate +author: John Snow Labs +name: burmese_awesome_model_priority_3_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_priority_3_pipeline` is a English model originally trained by lenate. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_priority_3_pipeline_en_5.5.0_3.0_1726842296394.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_priority_3_pipeline_en_5.5.0_3.0_1726842296394.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_priority_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_priority_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_priority_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lenate/my_awesome_model_priority_3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_ruhullah1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_ruhullah1_pipeline_en.md new file mode 100644 index 00000000000000..554c459b510404 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_ruhullah1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_ruhullah1_pipeline pipeline DistilBertForSequenceClassification from ruhullah1 +author: John Snow Labs +name: burmese_awesome_model_ruhullah1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_ruhullah1_pipeline` is a English model originally trained by ruhullah1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ruhullah1_pipeline_en_5.5.0_3.0_1726841476969.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ruhullah1_pipeline_en_5.5.0_3.0_1726841476969.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_ruhullah1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_ruhullah1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_ruhullah1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ruhullah1/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_qa_model_anamgarcia_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_qa_model_anamgarcia_en.md new file mode 100644 index 00000000000000..71d719cdf7bb4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_qa_model_anamgarcia_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_anamgarcia DistilBertForQuestionAnswering from anamgarcia +author: John Snow Labs +name: burmese_awesome_qa_model_anamgarcia +date: 2024-09-20 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_anamgarcia` is a English model originally trained by anamgarcia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_anamgarcia_en_5.5.0_3.0_1726851164219.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_anamgarcia_en_5.5.0_3.0_1726851164219.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_anamgarcia","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("burmese_awesome_qa_model_anamgarcia", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_anamgarcia| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/anamgarcia/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_insurance_mlm_model_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_insurance_mlm_model_en.md new file mode 100644 index 00000000000000..521c668736c70e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_insurance_mlm_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_insurance_mlm_model RoBertaEmbeddings from michaelfong2017 +author: John Snow Labs +name: burmese_insurance_mlm_model +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_insurance_mlm_model` is a English model originally trained by michaelfong2017. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_insurance_mlm_model_en_5.5.0_3.0_1726793771854.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_insurance_mlm_model_en_5.5.0_3.0_1726793771854.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_insurance_mlm_model","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_insurance_mlm_model","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_insurance_mlm_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/michaelfong2017/my_insurance_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-cat_1_html_distilbert_base_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-cat_1_html_distilbert_base_uncased_pipeline_en.md new file mode 100644 index 00000000000000..c33c7680e22689 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-cat_1_html_distilbert_base_uncased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cat_1_html_distilbert_base_uncased_pipeline pipeline DistilBertForSequenceClassification from chuuhtetnaing +author: John Snow Labs +name: cat_1_html_distilbert_base_uncased_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_1_html_distilbert_base_uncased_pipeline` is a English model originally trained by chuuhtetnaing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_1_html_distilbert_base_uncased_pipeline_en_5.5.0_3.0_1726832512257.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_1_html_distilbert_base_uncased_pipeline_en_5.5.0_3.0_1726832512257.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cat_1_html_distilbert_base_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cat_1_html_distilbert_base_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_1_html_distilbert_base_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chuuhtetnaing/cat-1-html-distilbert-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-cat_ner_spanish_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-cat_ner_spanish_5_pipeline_en.md new file mode 100644 index 00000000000000..7f0505cd0d3743 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-cat_ner_spanish_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cat_ner_spanish_5_pipeline pipeline RoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_ner_spanish_5_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_ner_spanish_5_pipeline` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_ner_spanish_5_pipeline_en_5.5.0_3.0_1726847592576.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_ner_spanish_5_pipeline_en_5.5.0_3.0_1726847592576.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cat_ner_spanish_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cat_ner_spanish_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_ner_spanish_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|462.3 MB| + +## References + +https://huggingface.co/homersimpson/cat-ner-es-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-criminal_case_classifier1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-criminal_case_classifier1_pipeline_en.md new file mode 100644 index 00000000000000..5ab2387530b537 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-criminal_case_classifier1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English criminal_case_classifier1_pipeline pipeline DistilBertForSequenceClassification from LahiruProjects +author: John Snow Labs +name: criminal_case_classifier1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`criminal_case_classifier1_pipeline` is a English model originally trained by LahiruProjects. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/criminal_case_classifier1_pipeline_en_5.5.0_3.0_1726841006254.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/criminal_case_classifier1_pipeline_en_5.5.0_3.0_1726841006254.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("criminal_case_classifier1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("criminal_case_classifier1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|criminal_case_classifier1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LahiruProjects/criminal-case-classifier1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-custommodel_v2c_k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-custommodel_v2c_k_pipeline_en.md new file mode 100644 index 00000000000000..03607054eccbc9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-custommodel_v2c_k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English custommodel_v2c_k_pipeline pipeline DistilBertForSequenceClassification from katowtkkk +author: John Snow Labs +name: custommodel_v2c_k_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`custommodel_v2c_k_pipeline` is a English model originally trained by katowtkkk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/custommodel_v2c_k_pipeline_en_5.5.0_3.0_1726832619598.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/custommodel_v2c_k_pipeline_en_5.5.0_3.0_1726832619598.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("custommodel_v2c_k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("custommodel_v2c_k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|custommodel_v2c_k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|257.2 KB| + +## References + +https://huggingface.co/katowtkkk/CustomModel_v2c_k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-cyberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-cyberta_pipeline_en.md new file mode 100644 index 00000000000000..248fac1e8b7fe8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-cyberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cyberta_pipeline pipeline RoBertaEmbeddings from mstaron +author: John Snow Labs +name: cyberta_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cyberta_pipeline` is a English model originally trained by mstaron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cyberta_pipeline_en_5.5.0_3.0_1726816194478.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cyberta_pipeline_en_5.5.0_3.0_1726816194478.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cyberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cyberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cyberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|310.9 MB| + +## References + +https://huggingface.co/mstaron/CyBERTa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-db_mc_2_0_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-db_mc_2_0_1_pipeline_en.md new file mode 100644 index 00000000000000..dcb4a9a56cf11e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-db_mc_2_0_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English db_mc_2_0_1_pipeline pipeline DistilBertForSequenceClassification from exala +author: John Snow Labs +name: db_mc_2_0_1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`db_mc_2_0_1_pipeline` is a English model originally trained by exala. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/db_mc_2_0_1_pipeline_en_5.5.0_3.0_1726792141759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/db_mc_2_0_1_pipeline_en_5.5.0_3.0_1726792141759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("db_mc_2_0_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("db_mc_2_0_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|db_mc_2_0_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/exala/db_mc_2.0.1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert5_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert5_en.md new file mode 100644 index 00000000000000..f8f715269defd9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert5 DistilBertForSequenceClassification from deptage +author: John Snow Labs +name: distilbert5 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert5` is a English model originally trained by deptage. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert5_en_5.5.0_3.0_1726848919402.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert5_en_5.5.0_3.0_1726848919402.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/deptage/distilbert5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_denyszakharkevych_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_denyszakharkevych_pipeline_en.md new file mode 100644 index 00000000000000..94d814e3fdae8d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_denyszakharkevych_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_denyszakharkevych_pipeline pipeline DistilBertForSequenceClassification from DenysZakharkevych +author: John Snow Labs +name: distilbert_base_uncased_denyszakharkevych_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_denyszakharkevych_pipeline` is a English model originally trained by DenysZakharkevych. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_denyszakharkevych_pipeline_en_5.5.0_3.0_1726832408290.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_denyszakharkevych_pipeline_en_5.5.0_3.0_1726832408290.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_denyszakharkevych_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_denyszakharkevych_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_denyszakharkevych_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DenysZakharkevych/distilbert-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_ehottl_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_ehottl_en.md new file mode 100644 index 00000000000000..d812c25b14136a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_ehottl_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_ehottl DistilBertForSequenceClassification from ehottl +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_ehottl +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_ehottl` is a English model originally trained by ehottl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_ehottl_en_5.5.0_3.0_1726842553073.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_ehottl_en_5.5.0_3.0_1726842553073.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_ehottl","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_ehottl", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_ehottl| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/ehottl/distilbert-base-uncased-distilled-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_ehottl_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_ehottl_pipeline_en.md new file mode 100644 index 00000000000000..a40b58af234ee8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_ehottl_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_ehottl_pipeline pipeline DistilBertForSequenceClassification from ehottl +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_ehottl_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_ehottl_pipeline` is a English model originally trained by ehottl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_ehottl_pipeline_en_5.5.0_3.0_1726842564848.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_ehottl_pipeline_en_5.5.0_3.0_1726842564848.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_distilled_clinc_ehottl_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_distilled_clinc_ehottl_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_ehottl_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/ehottl/distilbert-base-uncased-distilled-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_maarten1953_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_maarten1953_en.md new file mode 100644 index 00000000000000..cbbb3513f9bb4a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_distilled_clinc_maarten1953_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_maarten1953 DistilBertForSequenceClassification from Maarten1953 +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_maarten1953 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_maarten1953` is a English model originally trained by Maarten1953. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_maarten1953_en_5.5.0_3.0_1726871658428.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_maarten1953_en_5.5.0_3.0_1726871658428.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_maarten1953","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_maarten1953", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_maarten1953| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/Maarten1953/distilbert-base-uncased-distilled-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_fibleep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_fibleep_pipeline_en.md new file mode 100644 index 00000000000000..e5407aab499f46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_fibleep_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_fibleep_pipeline pipeline DistilBertForSequenceClassification from fibleep +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_fibleep_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_fibleep_pipeline` is a English model originally trained by fibleep. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_fibleep_pipeline_en_5.5.0_3.0_1726861140678.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_fibleep_pipeline_en_5.5.0_3.0_1726861140678.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_fibleep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_fibleep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_fibleep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/fibleep/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_mealduct_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_mealduct_en.md new file mode 100644 index 00000000000000..dc267b6f49fb7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_clinc_mealduct_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_mealduct DistilBertForSequenceClassification from MealDuct +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_mealduct +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_mealduct` is a English model originally trained by MealDuct. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_mealduct_en_5.5.0_3.0_1726792283067.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_mealduct_en_5.5.0_3.0_1726792283067.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_mealduct","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_mealduct", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_mealduct| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/MealDuct/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_majid097_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_majid097_pipeline_en.md new file mode 100644 index 00000000000000..88822832226a6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_cola_majid097_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_majid097_pipeline pipeline DistilBertForSequenceClassification from Majid097 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_majid097_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_majid097_pipeline` is a English model originally trained by Majid097. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_majid097_pipeline_en_5.5.0_3.0_1726842135296.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_majid097_pipeline_en_5.5.0_3.0_1726842135296.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_majid097_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_majid097_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_majid097_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Majid097/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_btown2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_btown2_pipeline_en.md new file mode 100644 index 00000000000000..2ca2674ab61f71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_btown2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_btown2_pipeline pipeline DistilBertForSequenceClassification from btown2 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_btown2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_btown2_pipeline` is a English model originally trained by btown2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_btown2_pipeline_en_5.5.0_3.0_1726809411677.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_btown2_pipeline_en_5.5.0_3.0_1726809411677.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_btown2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_btown2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_btown2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/btown2/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_chhabi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_chhabi_pipeline_en.md new file mode 100644 index 00000000000000..acbeba3328cfdf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_chhabi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_chhabi_pipeline pipeline DistilBertForSequenceClassification from Chhabi +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_chhabi_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_chhabi_pipeline` is a English model originally trained by Chhabi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_chhabi_pipeline_en_5.5.0_3.0_1726823542054.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_chhabi_pipeline_en_5.5.0_3.0_1726823542054.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_chhabi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_chhabi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_chhabi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Chhabi/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_devborbot_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_devborbot_en.md new file mode 100644 index 00000000000000..a1049e4386601c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_devborbot_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_devborbot DistilBertForSequenceClassification from devBorbot +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_devborbot +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_devborbot` is a English model originally trained by devBorbot. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_devborbot_en_5.5.0_3.0_1726861011931.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_devborbot_en_5.5.0_3.0_1726861011931.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_devborbot","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_devborbot", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_devborbot| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/devBorbot/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_edarmartinez_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_edarmartinez_pipeline_en.md new file mode 100644 index 00000000000000..6e83b74800dd34 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_edarmartinez_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_edarmartinez_pipeline pipeline DistilBertForSequenceClassification from edarmartinez +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_edarmartinez_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_edarmartinez_pipeline` is a English model originally trained by edarmartinez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_edarmartinez_pipeline_en_5.5.0_3.0_1726830236335.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_edarmartinez_pipeline_en_5.5.0_3.0_1726830236335.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_edarmartinez_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_edarmartinez_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_edarmartinez_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/edarmartinez/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_farzanmrz_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_farzanmrz_en.md new file mode 100644 index 00000000000000..efe5f5fff5f64d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_farzanmrz_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_farzanmrz DistilBertForSequenceClassification from farzanmrz +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_farzanmrz +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_farzanmrz` is a English model originally trained by farzanmrz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_farzanmrz_en_5.5.0_3.0_1726791988989.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_farzanmrz_en_5.5.0_3.0_1726791988989.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_farzanmrz","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_farzanmrz", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_farzanmrz| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/farzanmrz/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_henrikho_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_henrikho_pipeline_en.md new file mode 100644 index 00000000000000..1a0791689ad4ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_henrikho_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_henrikho_pipeline pipeline DistilBertForSequenceClassification from henrikho +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_henrikho_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_henrikho_pipeline` is a English model originally trained by henrikho. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_henrikho_pipeline_en_5.5.0_3.0_1726809439723.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_henrikho_pipeline_en_5.5.0_3.0_1726809439723.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_henrikho_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_henrikho_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_henrikho_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/henrikho/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_jhagege_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_jhagege_pipeline_en.md new file mode 100644 index 00000000000000..7ac7155c95bcad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_jhagege_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jhagege_pipeline pipeline DistilBertForSequenceClassification from jhagege +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jhagege_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jhagege_pipeline` is a English model originally trained by jhagege. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jhagege_pipeline_en_5.5.0_3.0_1726842176211.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jhagege_pipeline_en_5.5.0_3.0_1726842176211.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jhagege_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jhagege_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jhagege_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jhagege/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_minsu_chae_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_minsu_chae_pipeline_en.md new file mode 100644 index 00000000000000..d31ba0f70546c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_minsu_chae_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_minsu_chae_pipeline pipeline DistilBertForSequenceClassification from Minsu-Chae +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_minsu_chae_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_minsu_chae_pipeline` is a English model originally trained by Minsu-Chae. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_minsu_chae_pipeline_en_5.5.0_3.0_1726809347022.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_minsu_chae_pipeline_en_5.5.0_3.0_1726809347022.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_minsu_chae_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_minsu_chae_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_minsu_chae_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Minsu-Chae/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_richwiss_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_richwiss_pipeline_en.md new file mode 100644 index 00000000000000..db25e25c9493ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_richwiss_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_richwiss_pipeline pipeline DistilBertForSequenceClassification from richwiss +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_richwiss_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_richwiss_pipeline` is a English model originally trained by richwiss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_richwiss_pipeline_en_5.5.0_3.0_1726830194022.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_richwiss_pipeline_en_5.5.0_3.0_1726830194022.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_richwiss_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_richwiss_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_richwiss_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/richwiss/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_thinsu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_thinsu_pipeline_en.md new file mode 100644 index 00000000000000..11f5184a4c1126 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_thinsu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_thinsu_pipeline pipeline DistilBertForSequenceClassification from ThinSu +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_thinsu_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_thinsu_pipeline` is a English model originally trained by ThinSu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_thinsu_pipeline_en_5.5.0_3.0_1726871623442.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_thinsu_pipeline_en_5.5.0_3.0_1726871623442.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_thinsu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_thinsu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_thinsu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ThinSu/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_m_express_emo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_m_express_emo_pipeline_en.md new file mode 100644 index 00000000000000..50889919152682 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_m_express_emo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_m_express_emo_pipeline pipeline DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_m_express_emo_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_m_express_emo_pipeline` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_m_express_emo_pipeline_en_5.5.0_3.0_1726823960076.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_m_express_emo_pipeline_en_5.5.0_3.0_1726823960076.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_m_express_emo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_m_express_emo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_m_express_emo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-m_express_emo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_rottentomatoes_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_rottentomatoes_en.md new file mode 100644 index 00000000000000..413470ed049ba7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_rottentomatoes_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_rottentomatoes DistilBertForSequenceClassification from mohamedsaeed823 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_rottentomatoes +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_rottentomatoes` is a English model originally trained by mohamedsaeed823. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_rottentomatoes_en_5.5.0_3.0_1726871306848.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_rottentomatoes_en_5.5.0_3.0_1726871306848.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_rottentomatoes","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_rottentomatoes", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_rottentomatoes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mohamedsaeed823/distilbert-base-uncased-finetuned-RottenTomatoes \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_scam_classification_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_scam_classification_en.md new file mode 100644 index 00000000000000..f401ab2e76cc15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_scam_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_scam_classification DistilBertForSequenceClassification from jaranohaal +author: John Snow Labs +name: distilbert_base_uncased_finetuned_scam_classification +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_scam_classification` is a English model originally trained by jaranohaal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_scam_classification_en_5.5.0_3.0_1726809524754.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_scam_classification_en_5.5.0_3.0_1726809524754.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_scam_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_scam_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_scam_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jaranohaal/distilbert-base-uncased-finetuned-scam-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_ft_2_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_ft_2_en.md new file mode 100644 index 00000000000000..e4725b09c62a16 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_ft_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_ft_2 DistilBertForSequenceClassification from keylazy +author: John Snow Labs +name: distilbert_base_uncased_ft_2 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_ft_2` is a English model originally trained by keylazy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_ft_2_en_5.5.0_3.0_1726871482400.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_ft_2_en_5.5.0_3.0_1726871482400.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_ft_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_ft_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_ft_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/keylazy/distilbert-base-uncased-ft-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_imdb_edg3_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_imdb_edg3_en.md new file mode 100644 index 00000000000000..bcd47f73c4c9a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_imdb_edg3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_imdb_edg3 DistilBertForSequenceClassification from edg3 +author: John Snow Labs +name: distilbert_base_uncased_imdb_edg3 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_imdb_edg3` is a English model originally trained by edg3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_imdb_edg3_en_5.5.0_3.0_1726832412910.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_imdb_edg3_en_5.5.0_3.0_1726832412910.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_imdb_edg3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_imdb_edg3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_imdb_edg3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/edg3/distilbert-base-uncased-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en.md new file mode 100644 index 00000000000000..03fb0b53ba4928 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en_5.5.0_3.0_1726861032032.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en_5.5.0_3.0_1726861032032.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st11sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st11sd_ut72ut1_PLPrefix0stlarge_simsp300_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_en.md new file mode 100644 index 00000000000000..4520fc86b40a44 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_en_5.5.0_3.0_1726848789875.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_en_5.5.0_3.0_1726848789875.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st2sd_ut72ut1large2PfxNf_simsp400_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_pipeline_en.md new file mode 100644 index 00000000000000..101d87c5450b0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_pipeline_en_5.5.0_3.0_1726848802411.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_pipeline_en_5.5.0_3.0_1726848802411.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st2sd_ut72ut1large2pfxnf_simsp400_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st2sd_ut72ut1large2PfxNf_simsp400_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_en.md new file mode 100644 index 00000000000000..0b090b71f00188 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_en_5.5.0_3.0_1726823755633.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp_en_5.5.0_3.0_1726823755633.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge80_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1_PLPrefix0stlarge80_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_pipeline_en.md new file mode 100644 index 00000000000000..137b21a3950c22 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_pipeline_en_5.5.0_3.0_1726841234253.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_pipeline_en_5.5.0_3.0_1726841234253.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large40pfxnf_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1large40PfxNf_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_en.md new file mode 100644 index 00000000000000..c944690cba0645 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_en_5.5.0_3.0_1726849030957.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100_en_5.5.0_3.0_1726849030957.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1_plprefix0stlarge5_simsp100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st5sd_ut72ut1_PLPrefix0stlarge5_simsp100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en.md new file mode 100644 index 00000000000000..8d33df5112a0c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en_5.5.0_3.0_1726832505338.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en_5.5.0_3.0_1726832505338.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1largepfxnf_simsp300_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st7sd_ut72ut1largePfxNf_simsp300_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline_en.md new file mode 100644 index 00000000000000..a67a5f62fe9ae9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline_en_5.5.0_3.0_1726809352362.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline_en_5.5.0_3.0_1726809352362.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_odm_zphr_0st102sd_random_ut72ut1_plprefix0stlarge_simsp100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_odm_zphr_0st102sd_random_ut72ut1_PLPrefix0stlarge_simsp100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_pipeline_en.md new file mode 100644 index 00000000000000..e6ff8293e1b924 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_pipeline_en_5.5.0_3.0_1726809637823.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_pipeline_en_5.5.0_3.0_1726809637823.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut12ut1_plain_sp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut12ut1_plain_sp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_tuvalu_zephyr_3shot_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_tuvalu_zephyr_3shot_en.md new file mode 100644 index 00000000000000..36ef3512b70439 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_tuvalu_zephyr_3shot_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_tuvalu_zephyr_3shot DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_tuvalu_zephyr_3shot +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_tuvalu_zephyr_3shot` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_tuvalu_zephyr_3shot_en_5.5.0_3.0_1726832865573.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_tuvalu_zephyr_3shot_en_5.5.0_3.0_1726832865573.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_tuvalu_zephyr_3shot","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_tuvalu_zephyr_3shot", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_tuvalu_zephyr_3shot| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_tvl_zephyr_3shot \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_tuvalu_zephyr_3shot_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_tuvalu_zephyr_3shot_pipeline_en.md new file mode 100644 index 00000000000000..756ba3b9a41acf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_tuvalu_zephyr_3shot_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_tuvalu_zephyr_3shot_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_tuvalu_zephyr_3shot_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_tuvalu_zephyr_3shot_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_tuvalu_zephyr_3shot_pipeline_en_5.5.0_3.0_1726832878533.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_tuvalu_zephyr_3shot_pipeline_en_5.5.0_3.0_1726832878533.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_tuvalu_zephyr_3shot_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_tuvalu_zephyr_3shot_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_tuvalu_zephyr_3shot_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_tvl_zephyr_3shot + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_pipeline_en.md new file mode 100644 index 00000000000000..b0d6246ffb017a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_pipeline_en_5.5.0_3.0_1726823484191.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_pipeline_en_5.5.0_3.0_1726823484191.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_work_zphr_0st_ut72ut1_ad7dsc3_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_classification_task1_post_title_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_classification_task1_post_title_pipeline_en.md new file mode 100644 index 00000000000000..2450e8c6f1e293 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_classification_task1_post_title_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_classification_task1_post_title_pipeline pipeline DistilBertForSequenceClassification from abdulmanaam +author: John Snow Labs +name: distilbert_classification_task1_post_title_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_classification_task1_post_title_pipeline` is a English model originally trained by abdulmanaam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_classification_task1_post_title_pipeline_en_5.5.0_3.0_1726842227193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_classification_task1_post_title_pipeline_en_5.5.0_3.0_1726842227193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_classification_task1_post_title_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_classification_task1_post_title_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_classification_task1_post_title_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/abdulmanaam/distilbert_classification_task1_post_title + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_finetuned_vicentenedor_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_finetuned_vicentenedor_pipeline_en.md new file mode 100644 index 00000000000000..0b6a45d1e89188 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_finetuned_vicentenedor_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_finetuned_vicentenedor_pipeline pipeline DistilBertForSequenceClassification from vicentenedor +author: John Snow Labs +name: distilbert_finetuned_vicentenedor_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_vicentenedor_pipeline` is a English model originally trained by vicentenedor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_vicentenedor_pipeline_en_5.5.0_3.0_1726841231438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_vicentenedor_pipeline_en_5.5.0_3.0_1726841231438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_vicentenedor_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_vicentenedor_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_vicentenedor_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/vicentenedor/distilbert-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_imdb_huggingface_ysphang_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_imdb_huggingface_ysphang_en.md new file mode 100644 index 00000000000000..0eb734f6655c09 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_imdb_huggingface_ysphang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_imdb_huggingface_ysphang DistilBertForSequenceClassification from ysphang +author: John Snow Labs +name: distilbert_imdb_huggingface_ysphang +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_huggingface_ysphang` is a English model originally trained by ysphang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_huggingface_ysphang_en_5.5.0_3.0_1726860726167.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_huggingface_ysphang_en_5.5.0_3.0_1726860726167.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_huggingface_ysphang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_huggingface_ysphang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_huggingface_ysphang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ysphang/DISTILBERT-IMDB-HUGGINGFACE \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_imdb_padding70model_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_imdb_padding70model_en.md new file mode 100644 index 00000000000000..59429875626677 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_imdb_padding70model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_imdb_padding70model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_imdb_padding70model +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_padding70model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_padding70model_en_5.5.0_3.0_1726823689037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_padding70model_en_5.5.0_3.0_1726823689037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_padding70model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_padding70model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_padding70model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_imdb_padding70model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_intent_tanmoyeeroy_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_intent_tanmoyeeroy_en.md new file mode 100644 index 00000000000000..88cb9c046417dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_intent_tanmoyeeroy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_intent_tanmoyeeroy DistilBertForSequenceClassification from TanmoyeeRoy +author: John Snow Labs +name: distilbert_intent_tanmoyeeroy +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_intent_tanmoyeeroy` is a English model originally trained by TanmoyeeRoy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_intent_tanmoyeeroy_en_5.5.0_3.0_1726829858647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_intent_tanmoyeeroy_en_5.5.0_3.0_1726829858647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_intent_tanmoyeeroy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_intent_tanmoyeeroy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_intent_tanmoyeeroy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/TanmoyeeRoy/distilbert_intent \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_intent_tanmoyeeroy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_intent_tanmoyeeroy_pipeline_en.md new file mode 100644 index 00000000000000..418dbbb068563e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_intent_tanmoyeeroy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_intent_tanmoyeeroy_pipeline pipeline DistilBertForSequenceClassification from TanmoyeeRoy +author: John Snow Labs +name: distilbert_intent_tanmoyeeroy_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_intent_tanmoyeeroy_pipeline` is a English model originally trained by TanmoyeeRoy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_intent_tanmoyeeroy_pipeline_en_5.5.0_3.0_1726829870358.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_intent_tanmoyeeroy_pipeline_en_5.5.0_3.0_1726829870358.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_intent_tanmoyeeroy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_intent_tanmoyeeroy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_intent_tanmoyeeroy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/TanmoyeeRoy/distilbert_intent + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_cola_384_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_cola_384_en.md new file mode 100644 index 00000000000000..74299fd5ba9b89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_cola_384_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_cola_384 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_cola_384 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_cola_384` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_cola_384_en_5.5.0_3.0_1726842085285.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_cola_384_en_5.5.0_3.0_1726842085285.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_cola_384","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_cola_384", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_cola_384| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|111.8 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_cola_384 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_cola_96_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_cola_96_en.md new file mode 100644 index 00000000000000..05431ac042f308 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_cola_96_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_cola_96 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_cola_96 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_cola_96` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_cola_96_en_5.5.0_3.0_1726849132877.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_cola_96_en_5.5.0_3.0_1726849132877.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_cola_96","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_cola_96", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_cola_96| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|25.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_cola_96 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_scam_classifier_v1_1_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_scam_classifier_v1_1_en.md new file mode 100644 index 00000000000000..38d4168ba3abe5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_scam_classifier_v1_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_scam_classifier_v1_1 DistilBertForSequenceClassification from BothBosu +author: John Snow Labs +name: distilbert_scam_classifier_v1_1 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_scam_classifier_v1_1` is a English model originally trained by BothBosu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_scam_classifier_v1_1_en_5.5.0_3.0_1726841091839.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_scam_classifier_v1_1_en_5.5.0_3.0_1726841091839.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_scam_classifier_v1_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_scam_classifier_v1_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_scam_classifier_v1_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BothBosu/distilbert-scam-classifier-v1.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sst2_padding40model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sst2_padding40model_pipeline_en.md new file mode 100644 index 00000000000000..e06a3aa502ad9f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sst2_padding40model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sst2_padding40model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_sst2_padding40model_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sst2_padding40model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sst2_padding40model_pipeline_en_5.5.0_3.0_1726842195824.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sst2_padding40model_pipeline_en_5.5.0_3.0_1726842195824.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sst2_padding40model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sst2_padding40model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sst2_padding40model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_sst2_padding40model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilroberta_base_ft_askwomen_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilroberta_base_ft_askwomen_pipeline_en.md new file mode 100644 index 00000000000000..943bdfaea82e5a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilroberta_base_ft_askwomen_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_ft_askwomen_pipeline pipeline RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_askwomen_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_askwomen_pipeline` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_askwomen_pipeline_en_5.5.0_3.0_1726857426438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_askwomen_pipeline_en_5.5.0_3.0_1726857426438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_ft_askwomen_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_ft_askwomen_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_askwomen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-AskWomen + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilroberta_base_ft_wow_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilroberta_base_ft_wow_en.md new file mode 100644 index 00000000000000..2c003fecc991e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilroberta_base_ft_wow_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_ft_wow RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_wow +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_wow` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_wow_en_5.5.0_3.0_1726815880227.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_wow_en_5.5.0_3.0_1726815880227.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_wow","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_wow","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_wow| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-wow \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilrubert_tiny_2nd_finetune_epru_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilrubert_tiny_2nd_finetune_epru_en.md new file mode 100644 index 00000000000000..9aca7377fd2454 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilrubert_tiny_2nd_finetune_epru_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilrubert_tiny_2nd_finetune_epru DistilBertForSequenceClassification from mmillet +author: John Snow Labs +name: distilrubert_tiny_2nd_finetune_epru +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilrubert_tiny_2nd_finetune_epru` is a English model originally trained by mmillet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilrubert_tiny_2nd_finetune_epru_en_5.5.0_3.0_1726848739170.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilrubert_tiny_2nd_finetune_epru_en_5.5.0_3.0_1726848739170.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilrubert_tiny_2nd_finetune_epru","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilrubert_tiny_2nd_finetune_epru", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilrubert_tiny_2nd_finetune_epru| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|39.2 MB| + +## References + +https://huggingface.co/mmillet/distilrubert-tiny-2nd-finetune-epru \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-email_answer_extraction_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-email_answer_extraction_pipeline_en.md new file mode 100644 index 00000000000000..99592426ee612d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-email_answer_extraction_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English email_answer_extraction_pipeline pipeline RoBertaForTokenClassification from arya555 +author: John Snow Labs +name: email_answer_extraction_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`email_answer_extraction_pipeline` is a English model originally trained by arya555. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/email_answer_extraction_pipeline_en_5.5.0_3.0_1726853200290.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/email_answer_extraction_pipeline_en_5.5.0_3.0_1726853200290.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("email_answer_extraction_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("email_answer_extraction_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|email_answer_extraction_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|422.0 MB| + +## References + +https://huggingface.co/arya555/email_answer_extraction + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random0_seed1_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random0_seed1_bernice_en.md new file mode 100644 index 00000000000000..db8e4dce97a937 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-emoji_emoji_random0_seed1_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emoji_emoji_random0_seed1_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random0_seed1_bernice +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random0_seed1_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed1_bernice_en_5.5.0_3.0_1726866019659.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed1_bernice_en_5.5.0_3.0_1726866019659.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("emoji_emoji_random0_seed1_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("emoji_emoji_random0_seed1_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random0_seed1_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|825.3 MB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random0_seed1-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-final_reviews_finetuned_model_epoch_05_en.md b/docs/_posts/ahmedlone127/2024-09-20-final_reviews_finetuned_model_epoch_05_en.md new file mode 100644 index 00000000000000..21820307ee173e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-final_reviews_finetuned_model_epoch_05_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English final_reviews_finetuned_model_epoch_05 DistilBertForSequenceClassification from rahulgaikwad007 +author: John Snow Labs +name: final_reviews_finetuned_model_epoch_05 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_reviews_finetuned_model_epoch_05` is a English model originally trained by rahulgaikwad007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_reviews_finetuned_model_epoch_05_en_5.5.0_3.0_1726849045204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_reviews_finetuned_model_epoch_05_en_5.5.0_3.0_1726849045204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("final_reviews_finetuned_model_epoch_05","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("final_reviews_finetuned_model_epoch_05", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_reviews_finetuned_model_epoch_05| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rahulgaikwad007/Final-reviews-Finetuned-model-Epoch-05 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_en.md b/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_en.md new file mode 100644 index 00000000000000..cd7a954315a1cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001 BertForQuestionAnswering from muhammadravi251001 +author: John Snow Labs +name: fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001 +date: 2024-09-20 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001` is a English model originally trained by muhammadravi251001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_en_5.5.0_3.0_1726820457250.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001_en_5.5.0_3.0_1726820457250.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_datasetqas_idk_mrc_with_indobert_base_uncased_with_ittl_without_freeze_lr_1e_05_muhammadravi251001| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|411.7 MB| + +## References + +https://huggingface.co/muhammadravi251001/fine-tuned-DatasetQAS-IDK-MRC-with-indobert-base-uncased-with-ITTL-without-freeze-LR-1e-05 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_model_v1_en.md b/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_model_v1_en.md new file mode 100644 index 00000000000000..9df8a707b723e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-fine_tuned_model_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fine_tuned_model_v1 DistilBertForSequenceClassification from rb757 +author: John Snow Labs +name: fine_tuned_model_v1 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_model_v1` is a English model originally trained by rb757. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_model_v1_en_5.5.0_3.0_1726792130422.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_model_v1_en_5.5.0_3.0_1726792130422.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("fine_tuned_model_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("fine_tuned_model_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_model_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rb757/fine_tuned_model_v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_amazon_sample100000_text_robertamodel_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_amazon_sample100000_text_robertamodel_en.md new file mode 100644 index 00000000000000..7d8582fd976a95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_amazon_sample100000_text_robertamodel_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_amazon_sample100000_text_robertamodel RoBertaForSequenceClassification from hsiuping +author: John Snow Labs +name: finetuning_amazon_sample100000_text_robertamodel +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_amazon_sample100000_text_robertamodel` is a English model originally trained by hsiuping. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_amazon_sample100000_text_robertamodel_en_5.5.0_3.0_1726851763956.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_amazon_sample100000_text_robertamodel_en_5.5.0_3.0_1726851763956.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("finetuning_amazon_sample100000_text_robertamodel","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("finetuning_amazon_sample100000_text_robertamodel", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_amazon_sample100000_text_robertamodel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/hsiuping/finetuning-amazon-sample100000-text-RoBERTamodel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_imdb_sentiment_model_3000_samples_k_ray_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_imdb_sentiment_model_3000_samples_k_ray_pipeline_en.md new file mode 100644 index 00000000000000..cea6fb14915677 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_imdb_sentiment_model_3000_samples_k_ray_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_imdb_sentiment_model_3000_samples_k_ray_pipeline pipeline DistilBertForSequenceClassification from K-Ray +author: John Snow Labs +name: finetuning_imdb_sentiment_model_3000_samples_k_ray_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_imdb_sentiment_model_3000_samples_k_ray_pipeline` is a English model originally trained by K-Ray. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_imdb_sentiment_model_3000_samples_k_ray_pipeline_en_5.5.0_3.0_1726792480378.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_imdb_sentiment_model_3000_samples_k_ray_pipeline_en_5.5.0_3.0_1726792480378.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_imdb_sentiment_model_3000_samples_k_ray_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_imdb_sentiment_model_3000_samples_k_ray_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_imdb_sentiment_model_3000_samples_k_ray_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/K-Ray/finetuning-imdb-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_3_pipeline_en.md new file mode 100644 index 00000000000000..4da8a516e79c7e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_3_pipeline pipeline DistilBertForSequenceClassification from mamledes +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_3_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_3_pipeline` is a English model originally trained by mamledes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_3_pipeline_en_5.5.0_3.0_1726841986100.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_3_pipeline_en_5.5.0_3.0_1726841986100.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mamledes/finetuning-sentiment-model-3000-samples_3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_david1987bb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_david1987bb_pipeline_en.md new file mode 100644 index 00000000000000..046d821fa58e68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_david1987bb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_david1987bb_pipeline pipeline DistilBertForSequenceClassification from David1987BB +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_david1987bb_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_david1987bb_pipeline` is a English model originally trained by David1987BB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_david1987bb_pipeline_en_5.5.0_3.0_1726848706284.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_david1987bb_pipeline_en_5.5.0_3.0_1726848706284.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_david1987bb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_david1987bb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_david1987bb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/David1987BB/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_mk_20_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_mk_20_pipeline_en.md new file mode 100644 index 00000000000000..fd8d5913f3d4be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_mk_20_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_mk_20_pipeline pipeline DistilBertForSequenceClassification from mk-20 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_mk_20_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_mk_20_pipeline` is a English model originally trained by mk-20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_mk_20_pipeline_en_5.5.0_3.0_1726809117914.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_mk_20_pipeline_en_5.5.0_3.0_1726809117914.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_mk_20_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_mk_20_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_mk_20_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mk-20/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_naomaru_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_naomaru_pipeline_en.md new file mode 100644 index 00000000000000..1d69e3e0fdd3c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_naomaru_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_naomaru_pipeline pipeline DistilBertForSequenceClassification from naomaru +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_naomaru_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_naomaru_pipeline` is a English model originally trained by naomaru. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_naomaru_pipeline_en_5.5.0_3.0_1726809266024.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_naomaru_pipeline_en_5.5.0_3.0_1726809266024.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_naomaru_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_naomaru_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_naomaru_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/naomaru/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3004_samples_valliyammai_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3004_samples_valliyammai_en.md new file mode 100644 index 00000000000000..f842c0cda51a88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3004_samples_valliyammai_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3004_samples_valliyammai DistilBertForSequenceClassification from Valliyammai +author: John Snow Labs +name: finetuning_sentiment_model_3004_samples_valliyammai +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3004_samples_valliyammai` is a English model originally trained by Valliyammai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3004_samples_valliyammai_en_5.5.0_3.0_1726841255887.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3004_samples_valliyammai_en_5.5.0_3.0_1726841255887.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3004_samples_valliyammai","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3004_samples_valliyammai", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3004_samples_valliyammai| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Valliyammai/finetuning-sentiment-model-3004-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3004_samples_valliyammai_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3004_samples_valliyammai_pipeline_en.md new file mode 100644 index 00000000000000..e63c54a1ad5f40 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3004_samples_valliyammai_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3004_samples_valliyammai_pipeline pipeline DistilBertForSequenceClassification from Valliyammai +author: John Snow Labs +name: finetuning_sentiment_model_3004_samples_valliyammai_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3004_samples_valliyammai_pipeline` is a English model originally trained by Valliyammai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3004_samples_valliyammai_pipeline_en_5.5.0_3.0_1726841268457.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3004_samples_valliyammai_pipeline_en_5.5.0_3.0_1726841268457.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3004_samples_valliyammai_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3004_samples_valliyammai_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3004_samples_valliyammai_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Valliyammai/finetuning-sentiment-model-3004-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_5000_amazon_samples_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_5000_amazon_samples_en.md new file mode 100644 index 00000000000000..b5bf93ebfbe913 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_5000_amazon_samples_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_5000_amazon_samples DistilBertForSequenceClassification from maurocastill +author: John Snow Labs +name: finetuning_sentiment_model_5000_amazon_samples +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_5000_amazon_samples` is a English model originally trained by maurocastill. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazon_samples_en_5.5.0_3.0_1726830440123.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazon_samples_en_5.5.0_3.0_1726830440123.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_5000_amazon_samples","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_5000_amazon_samples", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_5000_amazon_samples| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/maurocastill/finetuning-sentiment-model-5000-amazon-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_5000_amazon_samples_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_5000_amazon_samples_pipeline_en.md new file mode 100644 index 00000000000000..9ca1ad56ca62ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_5000_amazon_samples_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_5000_amazon_samples_pipeline pipeline DistilBertForSequenceClassification from maurocastill +author: John Snow Labs +name: finetuning_sentiment_model_5000_amazon_samples_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_5000_amazon_samples_pipeline` is a English model originally trained by maurocastill. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazon_samples_pipeline_en_5.5.0_3.0_1726830453954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazon_samples_pipeline_en_5.5.0_3.0_1726830453954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_5000_amazon_samples_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_5000_amazon_samples_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_5000_amazon_samples_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/maurocastill/finetuning-sentiment-model-5000-amazon-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_enriquer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_enriquer_pipeline_en.md new file mode 100644 index 00000000000000..689e2f6339c6d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_enriquer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_enriquer_pipeline pipeline DistilBertForSequenceClassification from EnriqueR +author: John Snow Labs +name: finetuning_sentiment_model_enriquer_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_enriquer_pipeline` is a English model originally trained by EnriqueR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_enriquer_pipeline_en_5.5.0_3.0_1726792542831.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_enriquer_pipeline_en_5.5.0_3.0_1726792542831.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_enriquer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_enriquer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_enriquer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/EnriqueR/finetuning-sentiment-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-furina_seed42_eng_kinyarwanda_hau_cross_2e_05_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-furina_seed42_eng_kinyarwanda_hau_cross_2e_05_pipeline_en.md new file mode 100644 index 00000000000000..8d93a0b96963fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-furina_seed42_eng_kinyarwanda_hau_cross_2e_05_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English furina_seed42_eng_kinyarwanda_hau_cross_2e_05_pipeline pipeline XlmRoBertaForSequenceClassification from Shijia +author: John Snow Labs +name: furina_seed42_eng_kinyarwanda_hau_cross_2e_05_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`furina_seed42_eng_kinyarwanda_hau_cross_2e_05_pipeline` is a English model originally trained by Shijia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/furina_seed42_eng_kinyarwanda_hau_cross_2e_05_pipeline_en_5.5.0_3.0_1726865591572.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/furina_seed42_eng_kinyarwanda_hau_cross_2e_05_pipeline_en_5.5.0_3.0_1726865591572.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("furina_seed42_eng_kinyarwanda_hau_cross_2e_05_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("furina_seed42_eng_kinyarwanda_hau_cross_2e_05_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|furina_seed42_eng_kinyarwanda_hau_cross_2e_05_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/Shijia/furina_seed42_eng_kin_hau_cross_2e-05 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-hate_hate_random0_seed0_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-20-hate_hate_random0_seed0_bernice_en.md new file mode 100644 index 00000000000000..3d58fa75ca3efc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-hate_hate_random0_seed0_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_random0_seed0_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_random0_seed0_bernice +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_random0_seed0_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_random0_seed0_bernice_en_5.5.0_3.0_1726865766972.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_random0_seed0_bernice_en_5.5.0_3.0_1726865766972.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("hate_hate_random0_seed0_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("hate_hate_random0_seed0_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_random0_seed0_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|783.4 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_random0_seed0-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-hw_1_jerry81920_en.md b/docs/_posts/ahmedlone127/2024-09-20-hw_1_jerry81920_en.md new file mode 100644 index 00000000000000..5f33c3b6c79277 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-hw_1_jerry81920_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hw_1_jerry81920 DistilBertForSequenceClassification from Jerry81920 +author: John Snow Labs +name: hw_1_jerry81920 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hw_1_jerry81920` is a English model originally trained by Jerry81920. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hw_1_jerry81920_en_5.5.0_3.0_1726829860054.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hw_1_jerry81920_en_5.5.0_3.0_1726829860054.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("hw_1_jerry81920","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("hw_1_jerry81920", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hw_1_jerry81920| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Jerry81920/hw-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ibert_roberta_base_finetuned_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-20-ibert_roberta_base_finetuned_imdb_en.md new file mode 100644 index 00000000000000..28c36946bdd58a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ibert_roberta_base_finetuned_imdb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ibert_roberta_base_finetuned_imdb RoBertaEmbeddings from elayat +author: John Snow Labs +name: ibert_roberta_base_finetuned_imdb +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ibert_roberta_base_finetuned_imdb` is a English model originally trained by elayat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ibert_roberta_base_finetuned_imdb_en_5.5.0_3.0_1726815707272.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ibert_roberta_base_finetuned_imdb_en_5.5.0_3.0_1726815707272.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("ibert_roberta_base_finetuned_imdb","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("ibert_roberta_base_finetuned_imdb","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ibert_roberta_base_finetuned_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/elayat/ibert-roberta-base-finetuned-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-intent_classification_distilbert_hoaan2003_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-intent_classification_distilbert_hoaan2003_pipeline_en.md new file mode 100644 index 00000000000000..fd7e049266ce4e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-intent_classification_distilbert_hoaan2003_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English intent_classification_distilbert_hoaan2003_pipeline pipeline DistilBertForSequenceClassification from HoaAn2003 +author: John Snow Labs +name: intent_classification_distilbert_hoaan2003_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`intent_classification_distilbert_hoaan2003_pipeline` is a English model originally trained by HoaAn2003. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/intent_classification_distilbert_hoaan2003_pipeline_en_5.5.0_3.0_1726848645337.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/intent_classification_distilbert_hoaan2003_pipeline_en_5.5.0_3.0_1726848645337.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("intent_classification_distilbert_hoaan2003_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("intent_classification_distilbert_hoaan2003_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|intent_classification_distilbert_hoaan2003_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/HoaAn2003/intent_classification_distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-kannadabert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-kannadabert_pipeline_en.md new file mode 100644 index 00000000000000..affdff991e8167 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-kannadabert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English kannadabert_pipeline pipeline RoBertaEmbeddings from Chakita +author: John Snow Labs +name: kannadabert_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kannadabert_pipeline` is a English model originally trained by Chakita. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kannadabert_pipeline_en_5.5.0_3.0_1726857841170.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kannadabert_pipeline_en_5.5.0_3.0_1726857841170.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("kannadabert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("kannadabert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kannadabert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|450.7 MB| + +## References + +https://huggingface.co/Chakita/KannadaBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent2_en.md b/docs/_posts/ahmedlone127/2024-09-20-kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent2_en.md new file mode 100644 index 00000000000000..7db16ae4a38c91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent2 RoBertaForSequenceClassification from RogerB +author: John Snow Labs +name: kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent2 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent2` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent2_en_5.5.0_3.0_1726804592386.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent2_en_5.5.0_3.0_1726804592386.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kinyaroberta_large_kinte_finetuned_kinyarwanda_tweet_finetuned_kinyarwanda_sent2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.3 MB| + +## References + +https://huggingface.co/RogerB/kinyaRoberta-large-kinte-finetuned-kin-tweet-finetuned-kin-sent2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ks8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-ks8_pipeline_en.md new file mode 100644 index 00000000000000..6a1808d8ea568f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ks8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ks8_pipeline pipeline RoBertaForSequenceClassification from aloxatel +author: John Snow Labs +name: ks8_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ks8_pipeline` is a English model originally trained by aloxatel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ks8_pipeline_en_5.5.0_3.0_1726849861184.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ks8_pipeline_en_5.5.0_3.0_1726849861184.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ks8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ks8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ks8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/aloxatel/KS8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-llm_hw1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-llm_hw1_pipeline_en.md new file mode 100644 index 00000000000000..9f9cccb1a99b42 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-llm_hw1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English llm_hw1_pipeline pipeline DistilBertForSequenceClassification from Chenbirdy +author: John Snow Labs +name: llm_hw1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llm_hw1_pipeline` is a English model originally trained by Chenbirdy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llm_hw1_pipeline_en_5.5.0_3.0_1726809449307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llm_hw1_pipeline_en_5.5.0_3.0_1726809449307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("llm_hw1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("llm_hw1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llm_hw1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Chenbirdy/LLM-HW1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-llmclasswork1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-llmclasswork1_pipeline_en.md new file mode 100644 index 00000000000000..38efb95530c05b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-llmclasswork1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English llmclasswork1_pipeline pipeline DistilBertForSequenceClassification from halu1003 +author: John Snow Labs +name: llmclasswork1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llmclasswork1_pipeline` is a English model originally trained by halu1003. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llmclasswork1_pipeline_en_5.5.0_3.0_1726842099151.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llmclasswork1_pipeline_en_5.5.0_3.0_1726842099151.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("llmclasswork1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("llmclasswork1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llmclasswork1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/halu1003/LLMClassWork1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-llmhw01_wsyar_en.md b/docs/_posts/ahmedlone127/2024-09-20-llmhw01_wsyar_en.md new file mode 100644 index 00000000000000..c7317fcd17d203 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-llmhw01_wsyar_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English llmhw01_wsyar DistilBertForSequenceClassification from wsyar +author: John Snow Labs +name: llmhw01_wsyar +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llmhw01_wsyar` is a English model originally trained by wsyar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llmhw01_wsyar_en_5.5.0_3.0_1726849045703.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llmhw01_wsyar_en_5.5.0_3.0_1726849045703.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("llmhw01_wsyar","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("llmhw01_wsyar", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llmhw01_wsyar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/wsyar/llmhw01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-locale_detector_en.md b/docs/_posts/ahmedlone127/2024-09-20-locale_detector_en.md new file mode 100644 index 00000000000000..d6b06cd81d3b90 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-locale_detector_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English locale_detector XlmRoBertaForSequenceClassification from yo +author: John Snow Labs +name: locale_detector +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`locale_detector` is a English model originally trained by yo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/locale_detector_en_5.5.0_3.0_1726866256880.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/locale_detector_en_5.5.0_3.0_1726866256880.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("locale_detector","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("locale_detector", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|locale_detector| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|844.0 MB| + +## References + +https://huggingface.co/yo/locale-detector \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-log_pretrained_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-log_pretrained_bert_pipeline_en.md new file mode 100644 index 00000000000000..8db329077bf86e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-log_pretrained_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English log_pretrained_bert_pipeline pipeline BertEmbeddings from eun-woo +author: John Snow Labs +name: log_pretrained_bert_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`log_pretrained_bert_pipeline` is a English model originally trained by eun-woo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/log_pretrained_bert_pipeline_en_5.5.0_3.0_1726825616703.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/log_pretrained_bert_pipeline_en_5.5.0_3.0_1726825616703.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("log_pretrained_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("log_pretrained_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|log_pretrained_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.6 MB| + +## References + +https://huggingface.co/eun-woo/log_pretrained-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-max_pruned_90_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-max_pruned_90_model_pipeline_en.md new file mode 100644 index 00000000000000..8f8f7de80c2d01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-max_pruned_90_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English max_pruned_90_model_pipeline pipeline DistilBertForSequenceClassification from andygoh5 +author: John Snow Labs +name: max_pruned_90_model_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`max_pruned_90_model_pipeline` is a English model originally trained by andygoh5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/max_pruned_90_model_pipeline_en_5.5.0_3.0_1726792501048.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/max_pruned_90_model_pipeline_en_5.5.0_3.0_1726792501048.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("max_pruned_90_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("max_pruned_90_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|max_pruned_90_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/andygoh5/max-pruned-90-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-mixologydb_en.md b/docs/_posts/ahmedlone127/2024-09-20-mixologydb_en.md new file mode 100644 index 00000000000000..dc25419a4d1b3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-mixologydb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mixologydb DistilBertForSequenceClassification from mclemcrew +author: John Snow Labs +name: mixologydb +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mixologydb` is a English model originally trained by mclemcrew. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mixologydb_en_5.5.0_3.0_1726848544500.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mixologydb_en_5.5.0_3.0_1726848544500.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("mixologydb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("mixologydb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mixologydb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mclemcrew/MixologyDB \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-modello_finetunato_en.md b/docs/_posts/ahmedlone127/2024-09-20-modello_finetunato_en.md new file mode 100644 index 00000000000000..b9dd1122acb701 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-modello_finetunato_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English modello_finetunato DistilBertForSequenceClassification from soniarocca31 +author: John Snow Labs +name: modello_finetunato +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`modello_finetunato` is a English model originally trained by soniarocca31. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/modello_finetunato_en_5.5.0_3.0_1726848805090.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/modello_finetunato_en_5.5.0_3.0_1726848805090.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("modello_finetunato","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("modello_finetunato", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|modello_finetunato| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/soniarocca31/modello_finetunato \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-mongolian_xlm_roberta_base_ner_hrl_mn.md b/docs/_posts/ahmedlone127/2024-09-20-mongolian_xlm_roberta_base_ner_hrl_mn.md new file mode 100644 index 00000000000000..cc17f6184ef295 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-mongolian_xlm_roberta_base_ner_hrl_mn.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Mongolian mongolian_xlm_roberta_base_ner_hrl XlmRoBertaForTokenClassification from srglnjmb +author: John Snow Labs +name: mongolian_xlm_roberta_base_ner_hrl +date: 2024-09-20 +tags: [mn, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mongolian_xlm_roberta_base_ner_hrl` is a Mongolian model originally trained by srglnjmb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mongolian_xlm_roberta_base_ner_hrl_mn_5.5.0_3.0_1726843479456.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mongolian_xlm_roberta_base_ner_hrl_mn_5.5.0_3.0_1726843479456.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("mongolian_xlm_roberta_base_ner_hrl","mn") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("mongolian_xlm_roberta_base_ner_hrl", "mn") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mongolian_xlm_roberta_base_ner_hrl| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|mn| +|Size:|911.8 MB| + +## References + +https://huggingface.co/srglnjmb/Mongolian-xlm-roberta-base-ner-hrl \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-n_distilbert_twitterfin_padding10model_wyzhw_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-n_distilbert_twitterfin_padding10model_wyzhw_pipeline_en.md new file mode 100644 index 00000000000000..d1c47d9aa9e4cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-n_distilbert_twitterfin_padding10model_wyzhw_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_distilbert_twitterfin_padding10model_wyzhw_pipeline pipeline DistilBertForSequenceClassification from wyzhw +author: John Snow Labs +name: n_distilbert_twitterfin_padding10model_wyzhw_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_twitterfin_padding10model_wyzhw_pipeline` is a English model originally trained by wyzhw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_twitterfin_padding10model_wyzhw_pipeline_en_5.5.0_3.0_1726860773633.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_twitterfin_padding10model_wyzhw_pipeline_en_5.5.0_3.0_1726860773633.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_distilbert_twitterfin_padding10model_wyzhw_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_distilbert_twitterfin_padding10model_wyzhw_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_twitterfin_padding10model_wyzhw_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/wyzhw/N_distilbert_twitterfin_padding10model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ndd_mantisbt_test_tags_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-ndd_mantisbt_test_tags_pipeline_en.md new file mode 100644 index 00000000000000..15336e07cc8829 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ndd_mantisbt_test_tags_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ndd_mantisbt_test_tags_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: ndd_mantisbt_test_tags_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ndd_mantisbt_test_tags_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ndd_mantisbt_test_tags_pipeline_en_5.5.0_3.0_1726871320100.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ndd_mantisbt_test_tags_pipeline_en_5.5.0_3.0_1726871320100.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ndd_mantisbt_test_tags_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ndd_mantisbt_test_tags_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ndd_mantisbt_test_tags_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/NDD-mantisbt_test-tags + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ndd_petclinic_test_content_tags_en.md b/docs/_posts/ahmedlone127/2024-09-20-ndd_petclinic_test_content_tags_en.md new file mode 100644 index 00000000000000..96d133a0df72ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ndd_petclinic_test_content_tags_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ndd_petclinic_test_content_tags DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: ndd_petclinic_test_content_tags +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ndd_petclinic_test_content_tags` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ndd_petclinic_test_content_tags_en_5.5.0_3.0_1726829907645.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ndd_petclinic_test_content_tags_en_5.5.0_3.0_1726829907645.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("ndd_petclinic_test_content_tags","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("ndd_petclinic_test_content_tags", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ndd_petclinic_test_content_tags| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/NDD-petclinic_test-content_tags \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ndd_petclinic_test_content_tags_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-ndd_petclinic_test_content_tags_pipeline_en.md new file mode 100644 index 00000000000000..5d69dd3e4c3cf4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ndd_petclinic_test_content_tags_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ndd_petclinic_test_content_tags_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: ndd_petclinic_test_content_tags_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ndd_petclinic_test_content_tags_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ndd_petclinic_test_content_tags_pipeline_en_5.5.0_3.0_1726829919462.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ndd_petclinic_test_content_tags_pipeline_en_5.5.0_3.0_1726829919462.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ndd_petclinic_test_content_tags_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ndd_petclinic_test_content_tags_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ndd_petclinic_test_content_tags_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/NDD-petclinic_test-content_tags + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-nepal_bhasa_dummy_model_thewitcher_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-nepal_bhasa_dummy_model_thewitcher_pipeline_en.md new file mode 100644 index 00000000000000..efca09755930a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-nepal_bhasa_dummy_model_thewitcher_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nepal_bhasa_dummy_model_thewitcher_pipeline pipeline DistilBertForSequenceClassification from theWitcher +author: John Snow Labs +name: nepal_bhasa_dummy_model_thewitcher_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_dummy_model_thewitcher_pipeline` is a English model originally trained by theWitcher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_dummy_model_thewitcher_pipeline_en_5.5.0_3.0_1726809197972.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_dummy_model_thewitcher_pipeline_en_5.5.0_3.0_1726809197972.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nepal_bhasa_dummy_model_thewitcher_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nepal_bhasa_dummy_model_thewitcher_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_dummy_model_thewitcher_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/theWitcher/new-dummy-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ner_ner_random2_seed1_twitter_roberta_large_2022_154m_en.md b/docs/_posts/ahmedlone127/2024-09-20-ner_ner_random2_seed1_twitter_roberta_large_2022_154m_en.md new file mode 100644 index 00000000000000..106207976c5f4a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ner_ner_random2_seed1_twitter_roberta_large_2022_154m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_ner_random2_seed1_twitter_roberta_large_2022_154m RoBertaForTokenClassification from tweettemposhift +author: John Snow Labs +name: ner_ner_random2_seed1_twitter_roberta_large_2022_154m +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_ner_random2_seed1_twitter_roberta_large_2022_154m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_ner_random2_seed1_twitter_roberta_large_2022_154m_en_5.5.0_3.0_1726853704141.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_ner_random2_seed1_twitter_roberta_large_2022_154m_en_5.5.0_3.0_1726853704141.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("ner_ner_random2_seed1_twitter_roberta_large_2022_154m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("ner_ner_random2_seed1_twitter_roberta_large_2022_154m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_ner_random2_seed1_twitter_roberta_large_2022_154m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/ner-ner_random2_seed1-twitter-roberta-large-2022-154m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-nlp_cw_en.md b/docs/_posts/ahmedlone127/2024-09-20-nlp_cw_en.md new file mode 100644 index 00000000000000..10e1fdb3146578 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-nlp_cw_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nlp_cw RoBertaForTokenClassification from venkateshtata +author: John Snow Labs +name: nlp_cw +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_cw` is a English model originally trained by venkateshtata. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_cw_en_5.5.0_3.0_1726847615408.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_cw_en_5.5.0_3.0_1726847615408.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("nlp_cw","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("nlp_cw", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_cw| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|425.7 MB| + +## References + +https://huggingface.co/venkateshtata/nlp_cw \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-prueba_en.md b/docs/_posts/ahmedlone127/2024-09-20-prueba_en.md new file mode 100644 index 00000000000000..18fe7f8156b59d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-prueba_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English prueba DistilBertForSequenceClassification from rayosoftware +author: John Snow Labs +name: prueba +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`prueba` is a English model originally trained by rayosoftware. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/prueba_en_5.5.0_3.0_1726832556209.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/prueba_en_5.5.0_3.0_1726832556209.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("prueba","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("prueba", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|prueba| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rayosoftware/prueba \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-question_classification_minervabotteam_en.md b/docs/_posts/ahmedlone127/2024-09-20-question_classification_minervabotteam_en.md new file mode 100644 index 00000000000000..dff5e978946752 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-question_classification_minervabotteam_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English question_classification_minervabotteam DistilBertForSequenceClassification from MinervaBotTeam +author: John Snow Labs +name: question_classification_minervabotteam +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`question_classification_minervabotteam` is a English model originally trained by MinervaBotTeam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/question_classification_minervabotteam_en_5.5.0_3.0_1726849123627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/question_classification_minervabotteam_en_5.5.0_3.0_1726849123627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("question_classification_minervabotteam","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("question_classification_minervabotteam", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|question_classification_minervabotteam| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MinervaBotTeam/Question_Classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_en.md b/docs/_posts/ahmedlone127/2024-09-20-re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_en.md new file mode 100644 index 00000000000000..0c4aed133c7a91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned RoBertaForTokenClassification from ajtamayoh +author: John Snow Labs +name: re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned` is a English model originally trained by ajtamayoh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_en_5.5.0_3.0_1726853301092.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_en_5.5.0_3.0_1726853301092.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|437.6 MB| + +## References + +https://huggingface.co/ajtamayoh/RE_NegREF_NSD_Nubes_Training_Test_dataset_RoBERTa_base_bne_fine_tuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_pipeline_en.md new file mode 100644 index 00000000000000..785c6155609ebe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_pipeline pipeline RoBertaForTokenClassification from ajtamayoh +author: John Snow Labs +name: re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_pipeline` is a English model originally trained by ajtamayoh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_pipeline_en_5.5.0_3.0_1726853324323.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_pipeline_en_5.5.0_3.0_1726853324323.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|re_negref_nsd_nubes_training_test_dataset_roberta_base_bne_fine_tuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|437.6 MB| + +## References + +https://huggingface.co/ajtamayoh/RE_NegREF_NSD_Nubes_Training_Test_dataset_RoBERTa_base_bne_fine_tuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-regression_roberta_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-regression_roberta_2_pipeline_en.md new file mode 100644 index 00000000000000..26f7b0fee69598 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-regression_roberta_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English regression_roberta_2_pipeline pipeline RoBertaForSequenceClassification from Svetlana0303 +author: John Snow Labs +name: regression_roberta_2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`regression_roberta_2_pipeline` is a English model originally trained by Svetlana0303. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/regression_roberta_2_pipeline_en_5.5.0_3.0_1726804423351.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/regression_roberta_2_pipeline_en_5.5.0_3.0_1726804423351.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("regression_roberta_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("regression_roberta_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|regression_roberta_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|420.3 MB| + +## References + +https://huggingface.co/Svetlana0303/Regression_roberta_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-results_cyh002_en.md b/docs/_posts/ahmedlone127/2024-09-20-results_cyh002_en.md new file mode 100644 index 00000000000000..e0d66336f685a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-results_cyh002_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English results_cyh002 DistilBertForSequenceClassification from cyh002 +author: John Snow Labs +name: results_cyh002 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`results_cyh002` is a English model originally trained by cyh002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/results_cyh002_en_5.5.0_3.0_1726860851107.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/results_cyh002_en_5.5.0_3.0_1726860851107.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("results_cyh002","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("results_cyh002", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|results_cyh002| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cyh002/results \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_agnews_padding20model_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_agnews_padding20model_en.md new file mode 100644 index 00000000000000..26067fc25087bf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_agnews_padding20model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_agnews_padding20model RoBertaForSequenceClassification from Realgon +author: John Snow Labs +name: roberta_agnews_padding20model +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_agnews_padding20model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_agnews_padding20model_en_5.5.0_3.0_1726851897212.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_agnews_padding20model_en_5.5.0_3.0_1726851897212.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_agnews_padding20model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_agnews_padding20model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_agnews_padding20model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|463.9 MB| + +## References + +https://huggingface.co/Realgon/roberta_agnews_padding20model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ag_news_aktsvigun_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ag_news_aktsvigun_en.md new file mode 100644 index 00000000000000..cec1bd43db9376 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ag_news_aktsvigun_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_ag_news_aktsvigun RoBertaForSequenceClassification from Aktsvigun +author: John Snow Labs +name: roberta_base_ag_news_aktsvigun +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ag_news_aktsvigun` is a English model originally trained by Aktsvigun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ag_news_aktsvigun_en_5.5.0_3.0_1726805129339.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ag_news_aktsvigun_en_5.5.0_3.0_1726805129339.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_ag_news_aktsvigun","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_ag_news_aktsvigun", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ag_news_aktsvigun| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.4 MB| + +## References + +https://huggingface.co/Aktsvigun/roberta-base-ag_news \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bne_finetuned_detests_wandb24_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bne_finetuned_detests_wandb24_pipeline_en.md new file mode 100644 index 00000000000000..dd1ecf31b5088c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_bne_finetuned_detests_wandb24_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_detests_wandb24_pipeline pipeline RoBertaForSequenceClassification from Pablo94 +author: John Snow Labs +name: roberta_base_bne_finetuned_detests_wandb24_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_detests_wandb24_pipeline` is a English model originally trained by Pablo94. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_detests_wandb24_pipeline_en_5.5.0_3.0_1726850508499.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_detests_wandb24_pipeline_en_5.5.0_3.0_1726850508499.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_finetuned_detests_wandb24_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_finetuned_detests_wandb24_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_detests_wandb24_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|431.3 MB| + +## References + +https://huggingface.co/Pablo94/roberta-base-bne-finetuned-detests-wandb24 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_englishlawai_roberta_base_version4_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_englishlawai_roberta_base_version4_en.md new file mode 100644 index 00000000000000..80c06541777f21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_englishlawai_roberta_base_version4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_englishlawai_roberta_base_version4 RoBertaEmbeddings from Makabaka +author: John Snow Labs +name: roberta_base_englishlawai_roberta_base_version4 +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_englishlawai_roberta_base_version4` is a English model originally trained by Makabaka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_englishlawai_roberta_base_version4_en_5.5.0_3.0_1726793515075.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_englishlawai_roberta_base_version4_en_5.5.0_3.0_1726793515075.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_englishlawai_roberta_base_version4","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_englishlawai_roberta_base_version4","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_englishlawai_roberta_base_version4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.6 MB| + +## References + +https://huggingface.co/Makabaka/roberta-base-EnglishLawAI_roberta_base_version4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_finetuned_wallisian_manual_8ep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_finetuned_wallisian_manual_8ep_pipeline_en.md new file mode 100644 index 00000000000000..6e84028391d5df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_finetuned_wallisian_manual_8ep_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_manual_8ep_pipeline pipeline RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_manual_8ep_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_manual_8ep_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_8ep_pipeline_en_5.5.0_3.0_1726793313586.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_8ep_pipeline_en_5.5.0_3.0_1726793313586.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_wallisian_manual_8ep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_wallisian_manual_8ep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_manual_8ep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.1 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-manual-8ep + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_legal_indian_courts_downstream_build_rr_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_legal_indian_courts_downstream_build_rr_en.md new file mode 100644 index 00000000000000..364e28bfa9ab3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_legal_indian_courts_downstream_build_rr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_legal_indian_courts_downstream_build_rr RoBertaForTokenClassification from MHGanainy +author: John Snow Labs +name: roberta_base_legal_indian_courts_downstream_build_rr +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_legal_indian_courts_downstream_build_rr` is a English model originally trained by MHGanainy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_legal_indian_courts_downstream_build_rr_en_5.5.0_3.0_1726862684883.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_legal_indian_courts_downstream_build_rr_en_5.5.0_3.0_1726862684883.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_legal_indian_courts_downstream_build_rr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_legal_indian_courts_downstream_build_rr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_legal_indian_courts_downstream_build_rr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|464.2 MB| + +## References + +https://huggingface.co/MHGanainy/roberta-base-legal-indian-courts-downstream-build_rr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_demo_dolgorsureng_pipeline_mn.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_demo_dolgorsureng_pipeline_mn.md new file mode 100644 index 00000000000000..963d11d7dff91f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_demo_dolgorsureng_pipeline_mn.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Mongolian roberta_base_ner_demo_dolgorsureng_pipeline pipeline RoBertaForTokenClassification from Dolgorsureng +author: John Snow Labs +name: roberta_base_ner_demo_dolgorsureng_pipeline +date: 2024-09-20 +tags: [mn, open_source, pipeline, onnx] +task: Named Entity Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ner_demo_dolgorsureng_pipeline` is a Mongolian model originally trained by Dolgorsureng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ner_demo_dolgorsureng_pipeline_mn_5.5.0_3.0_1726847282840.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ner_demo_dolgorsureng_pipeline_mn_5.5.0_3.0_1726847282840.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_ner_demo_dolgorsureng_pipeline", lang = "mn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_ner_demo_dolgorsureng_pipeline", lang = "mn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ner_demo_dolgorsureng_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mn| +|Size:|465.7 MB| + +## References + +https://huggingface.co/Dolgorsureng/roberta-base-ner-demo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_test_mn.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_test_mn.md new file mode 100644 index 00000000000000..45e7ea9c4fe868 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_test_mn.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Mongolian roberta_base_ner_test RoBertaForTokenClassification from Dondog +author: John Snow Labs +name: roberta_base_ner_test +date: 2024-09-20 +tags: [mn, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ner_test` is a Mongolian model originally trained by Dondog. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ner_test_mn_5.5.0_3.0_1726853635725.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ner_test_mn_5.5.0_3.0_1726853635725.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_ner_test","mn") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_ner_test", "mn") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ner_test| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|mn| +|Size:|465.7 MB| + +## References + +https://huggingface.co/Dondog/roberta-base-ner-test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_test_pipeline_mn.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_test_pipeline_mn.md new file mode 100644 index 00000000000000..cd38586ae9cec0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_ner_test_pipeline_mn.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Mongolian roberta_base_ner_test_pipeline pipeline RoBertaForTokenClassification from Dondog +author: John Snow Labs +name: roberta_base_ner_test_pipeline +date: 2024-09-20 +tags: [mn, open_source, pipeline, onnx] +task: Named Entity Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ner_test_pipeline` is a Mongolian model originally trained by Dondog. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ner_test_pipeline_mn_5.5.0_3.0_1726853660360.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ner_test_pipeline_mn_5.5.0_3.0_1726853660360.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_ner_test_pipeline", lang = "mn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_ner_test_pipeline", lang = "mn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ner_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mn| +|Size:|465.7 MB| + +## References + +https://huggingface.co/Dondog/roberta-base-ner-test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_polyglotner_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_polyglotner_en.md new file mode 100644 index 00000000000000..c3df9a8b168ab3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_polyglotner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_polyglotner RoBertaForTokenClassification from CheccoCando +author: John Snow Labs +name: roberta_base_polyglotner +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_polyglotner` is a English model originally trained by CheccoCando. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_polyglotner_en_5.5.0_3.0_1726862634645.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_polyglotner_en_5.5.0_3.0_1726862634645.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_polyglotner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_polyglotner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_polyglotner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|449.1 MB| + +## References + +https://huggingface.co/CheccoCando/roberta-base_PolyglotNER \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_sst2_modeltc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_sst2_modeltc_pipeline_en.md new file mode 100644 index 00000000000000..86930cdd4eebc1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_sst2_modeltc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_sst2_modeltc_pipeline pipeline RoBertaForSequenceClassification from ModelTC +author: John Snow Labs +name: roberta_base_sst2_modeltc_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_sst2_modeltc_pipeline` is a English model originally trained by ModelTC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_sst2_modeltc_pipeline_en_5.5.0_3.0_1726851746124.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_sst2_modeltc_pipeline_en_5.5.0_3.0_1726851746124.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_sst2_modeltc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_sst2_modeltc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_sst2_modeltc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|445.6 MB| + +## References + +https://huggingface.co/ModelTC/roberta-base-sst2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_combined_generated_epoch_4_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_combined_generated_epoch_4_en.md new file mode 100644 index 00000000000000..dfd6d63a0be28c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_combined_generated_epoch_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_combined_generated_epoch_4 RoBertaForTokenClassification from ICT2214Team7 +author: John Snow Labs +name: roberta_combined_generated_epoch_4 +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_combined_generated_epoch_4` is a English model originally trained by ICT2214Team7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_combined_generated_epoch_4_en_5.5.0_3.0_1726853119606.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_combined_generated_epoch_4_en_5.5.0_3.0_1726853119606.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_combined_generated_epoch_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_combined_generated_epoch_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_combined_generated_epoch_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|306.6 MB| + +## References + +https://huggingface.co/ICT2214Team7/RoBERTa_Combined_Generated_epoch_4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_combined_generated_epoch_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_combined_generated_epoch_4_pipeline_en.md new file mode 100644 index 00000000000000..a74f2dd816f160 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_combined_generated_epoch_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_combined_generated_epoch_4_pipeline pipeline RoBertaForTokenClassification from ICT2214Team7 +author: John Snow Labs +name: roberta_combined_generated_epoch_4_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_combined_generated_epoch_4_pipeline` is a English model originally trained by ICT2214Team7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_combined_generated_epoch_4_pipeline_en_5.5.0_3.0_1726853134794.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_combined_generated_epoch_4_pipeline_en_5.5.0_3.0_1726853134794.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_combined_generated_epoch_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_combined_generated_epoch_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_combined_generated_epoch_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.6 MB| + +## References + +https://huggingface.co/ICT2214Team7/RoBERTa_Combined_Generated_epoch_4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_empai_finetuned_def_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_empai_finetuned_def_en.md new file mode 100644 index 00000000000000..5c5107518a9065 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_empai_finetuned_def_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_empai_finetuned_def RoBertaEmbeddings from LuangMV97 +author: John Snow Labs +name: roberta_empai_finetuned_def +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_empai_finetuned_def` is a English model originally trained by LuangMV97. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_empai_finetuned_def_en_5.5.0_3.0_1726796193487.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_empai_finetuned_def_en_5.5.0_3.0_1726796193487.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_empai_finetuned_def","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_empai_finetuned_def","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_empai_finetuned_def| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/LuangMV97/RoBERTa_EmpAI_FineTuned_def \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_bne_capitel_ner_spanish_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_bne_capitel_ner_spanish_pipeline_es.md new file mode 100644 index 00000000000000..4ce2ac7adb6115 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_bne_capitel_ner_spanish_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish roberta_large_bne_capitel_ner_spanish_pipeline pipeline RoBertaForTokenClassification from Dulfary +author: John Snow Labs +name: roberta_large_bne_capitel_ner_spanish_pipeline +date: 2024-09-20 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_bne_capitel_ner_spanish_pipeline` is a Castilian, Spanish model originally trained by Dulfary. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_bne_capitel_ner_spanish_pipeline_es_5.5.0_3.0_1726862508529.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_bne_capitel_ner_spanish_pipeline_es_5.5.0_3.0_1726862508529.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_bne_capitel_ner_spanish_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_bne_capitel_ner_spanish_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_bne_capitel_ner_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Dulfary/roberta-large-bne-capitel-ner_spanish + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_finetuned_abbr_epoch18_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_finetuned_abbr_epoch18_en.md new file mode 100644 index 00000000000000..a0ef6e224530f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_finetuned_abbr_epoch18_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_finetuned_abbr_epoch18 RoBertaForTokenClassification from karsimkh +author: John Snow Labs +name: roberta_large_finetuned_abbr_epoch18 +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_abbr_epoch18` is a English model originally trained by karsimkh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_abbr_epoch18_en_5.5.0_3.0_1726847423233.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_abbr_epoch18_en_5.5.0_3.0_1726847423233.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_finetuned_abbr_epoch18","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_finetuned_abbr_epoch18", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_abbr_epoch18| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/karsimkh/roberta-large-finetuned-abbr-Epoch18 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_pipeline_en.md new file mode 100644 index 00000000000000..07682d2b42f962 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_pipeline pipeline RoBertaEmbeddings from HPL +author: John Snow Labs +name: roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_pipeline` is a English model originally trained by HPL. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_pipeline_en_5.5.0_3.0_1726857658972.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_pipeline_en_5.5.0_3.0_1726857658972.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_unlabeled_labeled_gab_reddit_task_semeval2023_t10_270000sample_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/HPL/roberta-large-unlabeled-labeled-gab-reddit-task-semeval2023-t10-270000sample + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_with_labeled_data_and_unlabeled_gab_reddit_semeval2023_task10_13300_labeled_sample_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_with_labeled_data_and_unlabeled_gab_reddit_semeval2023_task10_13300_labeled_sample_pipeline_en.md new file mode 100644 index 00000000000000..65a9bba748aada --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_with_labeled_data_and_unlabeled_gab_reddit_semeval2023_task10_13300_labeled_sample_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_with_labeled_data_and_unlabeled_gab_reddit_semeval2023_task10_13300_labeled_sample_pipeline pipeline RoBertaEmbeddings from Hudee +author: John Snow Labs +name: roberta_large_with_labeled_data_and_unlabeled_gab_reddit_semeval2023_task10_13300_labeled_sample_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_with_labeled_data_and_unlabeled_gab_reddit_semeval2023_task10_13300_labeled_sample_pipeline` is a English model originally trained by Hudee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_with_labeled_data_and_unlabeled_gab_reddit_semeval2023_task10_13300_labeled_sample_pipeline_en_5.5.0_3.0_1726793875961.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_with_labeled_data_and_unlabeled_gab_reddit_semeval2023_task10_13300_labeled_sample_pipeline_en_5.5.0_3.0_1726793875961.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_with_labeled_data_and_unlabeled_gab_reddit_semeval2023_task10_13300_labeled_sample_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_with_labeled_data_and_unlabeled_gab_reddit_semeval2023_task10_13300_labeled_sample_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_with_labeled_data_and_unlabeled_gab_reddit_semeval2023_task10_13300_labeled_sample_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Hudee/roberta-large-with-labeled-data-and-unlabeled-gab-reddit-semeval2023-task10-13300-labeled-sample + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_medium_word_chinese_cluecorpussmall_zh.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_medium_word_chinese_cluecorpussmall_zh.md new file mode 100644 index 00000000000000..50fc5454650d99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_medium_word_chinese_cluecorpussmall_zh.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Chinese roberta_medium_word_chinese_cluecorpussmall BertEmbeddings from uer +author: John Snow Labs +name: roberta_medium_word_chinese_cluecorpussmall +date: 2024-09-20 +tags: [zh, open_source, onnx, embeddings, bert] +task: Embeddings +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_medium_word_chinese_cluecorpussmall` is a Chinese model originally trained by uer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_medium_word_chinese_cluecorpussmall_zh_5.5.0_3.0_1726806311796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_medium_word_chinese_cluecorpussmall_zh_5.5.0_3.0_1726806311796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("roberta_medium_word_chinese_cluecorpussmall","zh") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("roberta_medium_word_chinese_cluecorpussmall","zh") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_medium_word_chinese_cluecorpussmall| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|zh| +|Size:|287.4 MB| + +## References + +https://huggingface.co/uer/roberta-medium-word-chinese-cluecorpussmall \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_nerc_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_nerc_en.md new file mode 100644 index 00000000000000..9921786de3b2f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_nerc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_nerc RoBertaForTokenClassification from pawlo2013 +author: John Snow Labs +name: roberta_nerc +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_nerc` is a English model originally trained by pawlo2013. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_nerc_en_5.5.0_3.0_1726853476626.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_nerc_en_5.5.0_3.0_1726853476626.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_nerc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_nerc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_nerc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|432.0 MB| + +## References + +https://huggingface.co/pawlo2013/roberta-nerc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_nerc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_nerc_pipeline_en.md new file mode 100644 index 00000000000000..48ac8d326e3123 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_nerc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_nerc_pipeline pipeline RoBertaForTokenClassification from pawlo2013 +author: John Snow Labs +name: roberta_nerc_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_nerc_pipeline` is a English model originally trained by pawlo2013. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_nerc_pipeline_en_5.5.0_3.0_1726853511540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_nerc_pipeline_en_5.5.0_3.0_1726853511540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_nerc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_nerc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_nerc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|432.0 MB| + +## References + +https://huggingface.co/pawlo2013/roberta-nerc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_sayula_popoluca_tagging_amir01_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_sayula_popoluca_tagging_amir01_pipeline_en.md new file mode 100644 index 00000000000000..4fc134b6e1c070 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_sayula_popoluca_tagging_amir01_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_sayula_popoluca_tagging_amir01_pipeline pipeline RoBertaForTokenClassification from Amir01 +author: John Snow Labs +name: roberta_sayula_popoluca_tagging_amir01_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_sayula_popoluca_tagging_amir01_pipeline` is a English model originally trained by Amir01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_sayula_popoluca_tagging_amir01_pipeline_en_5.5.0_3.0_1726853272400.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_sayula_popoluca_tagging_amir01_pipeline_en_5.5.0_3.0_1726853272400.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_sayula_popoluca_tagging_amir01_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_sayula_popoluca_tagging_amir01_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_sayula_popoluca_tagging_amir01_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.5 MB| + +## References + +https://huggingface.co/Amir01/roberta-pos-tagging + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_small_word_chinese_cluecorpussmall_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_small_word_chinese_cluecorpussmall_pipeline_zh.md new file mode 100644 index 00000000000000..186f14ded1469e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_small_word_chinese_cluecorpussmall_pipeline_zh.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Chinese roberta_small_word_chinese_cluecorpussmall_pipeline pipeline BertEmbeddings from uer +author: John Snow Labs +name: roberta_small_word_chinese_cluecorpussmall_pipeline +date: 2024-09-20 +tags: [zh, open_source, pipeline, onnx] +task: Embeddings +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_small_word_chinese_cluecorpussmall_pipeline` is a Chinese model originally trained by uer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_small_word_chinese_cluecorpussmall_pipeline_zh_5.5.0_3.0_1726805984748.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_small_word_chinese_cluecorpussmall_pipeline_zh_5.5.0_3.0_1726805984748.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_small_word_chinese_cluecorpussmall_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_small_word_chinese_cluecorpussmall_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_small_word_chinese_cluecorpussmall_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|240.3 MB| + +## References + +https://huggingface.co/uer/roberta-small-word-chinese-cluecorpussmall + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_tagalog_base_ft_udpos213_japanese_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_tagalog_base_ft_udpos213_japanese_en.md new file mode 100644 index 00000000000000..f0404bc9094f39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_tagalog_base_ft_udpos213_japanese_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_tagalog_base_ft_udpos213_japanese RoBertaForTokenClassification from iceman2434 +author: John Snow Labs +name: roberta_tagalog_base_ft_udpos213_japanese +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tagalog_base_ft_udpos213_japanese` is a English model originally trained by iceman2434. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_japanese_en_5.5.0_3.0_1726862140811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_japanese_en_5.5.0_3.0_1726862140811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_tagalog_base_ft_udpos213_japanese","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_tagalog_base_ft_udpos213_japanese", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tagalog_base_ft_udpos213_japanese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/iceman2434/roberta-tagalog-base-ft-udpos213-ja \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-rubert_tiny_sberquad_6ep_1_en.md b/docs/_posts/ahmedlone127/2024-09-20-rubert_tiny_sberquad_6ep_1_en.md new file mode 100644 index 00000000000000..8efa91bf3aae9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-rubert_tiny_sberquad_6ep_1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English rubert_tiny_sberquad_6ep_1 BertForQuestionAnswering from Mathnub +author: John Snow Labs +name: rubert_tiny_sberquad_6ep_1 +date: 2024-09-20 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_tiny_sberquad_6ep_1` is a English model originally trained by Mathnub. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_tiny_sberquad_6ep_1_en_5.5.0_3.0_1726820368432.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_tiny_sberquad_6ep_1_en_5.5.0_3.0_1726820368432.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("rubert_tiny_sberquad_6ep_1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("rubert_tiny_sberquad_6ep_1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_tiny_sberquad_6ep_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|43.8 MB| + +## References + +https://huggingface.co/Mathnub/rubert-tiny-sberquad-6ep-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-rubert_tiny_sberquad_6ep_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-rubert_tiny_sberquad_6ep_1_pipeline_en.md new file mode 100644 index 00000000000000..f874c6e1611bfa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-rubert_tiny_sberquad_6ep_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English rubert_tiny_sberquad_6ep_1_pipeline pipeline BertForQuestionAnswering from Mathnub +author: John Snow Labs +name: rubert_tiny_sberquad_6ep_1_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_tiny_sberquad_6ep_1_pipeline` is a English model originally trained by Mathnub. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_tiny_sberquad_6ep_1_pipeline_en_5.5.0_3.0_1726820370859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_tiny_sberquad_6ep_1_pipeline_en_5.5.0_3.0_1726820370859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rubert_tiny_sberquad_6ep_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rubert_tiny_sberquad_6ep_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_tiny_sberquad_6ep_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|43.8 MB| + +## References + +https://huggingface.co/Mathnub/rubert-tiny-sberquad-6ep-1 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-scam_detection_zelchy_en.md b/docs/_posts/ahmedlone127/2024-09-20-scam_detection_zelchy_en.md new file mode 100644 index 00000000000000..f41625507dec07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-scam_detection_zelchy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English scam_detection_zelchy BertForSequenceClassification from zelchy +author: John Snow Labs +name: scam_detection_zelchy +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scam_detection_zelchy` is a English model originally trained by zelchy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scam_detection_zelchy_en_5.5.0_3.0_1726828994543.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scam_detection_zelchy_en_5.5.0_3.0_1726828994543.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("scam_detection_zelchy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("scam_detection_zelchy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scam_detection_zelchy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/zelchy/scam-detection \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_embedding_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_embedding_en.md new file mode 100644 index 00000000000000..1b69d306929937 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_embedding_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_embedding BertSentenceEmbeddings from CH3COOK +author: John Snow Labs +name: sent_bert_base_embedding +date: 2024-09-20 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_embedding` is a English model originally trained by CH3COOK. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_embedding_en_5.5.0_3.0_1726868130007.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_embedding_en_5.5.0_3.0_1726868130007.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_embedding","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_embedding","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_embedding| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.9 MB| + +## References + +https://huggingface.co/CH3COOK/bert-base-embedding \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_english_russian_cased_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_english_russian_cased_en.md new file mode 100644 index 00000000000000..125720683f6734 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_bert_base_english_russian_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_russian_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_russian_cased +date: 2024-09-20 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_russian_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_russian_cased_en_5.5.0_3.0_1726867228351.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_russian_cased_en_5.5.0_3.0_1726867228351.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_russian_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_russian_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_russian_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|428.3 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-ru-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_mental_bert_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_mental_bert_en.md new file mode 100644 index 00000000000000..59d0a9ef5b5961 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_mental_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_mental_bert BertSentenceEmbeddings from Zamoranesis +author: John Snow Labs +name: sent_mental_bert +date: 2024-09-20 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_mental_bert` is a English model originally trained by Zamoranesis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_mental_bert_en_5.5.0_3.0_1726801301293.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_mental_bert_en_5.5.0_3.0_1726801301293.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_mental_bert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_mental_bert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_mental_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|406.6 MB| + +## References + +https://huggingface.co/Zamoranesis/mental_bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sent_mental_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-sent_mental_bert_pipeline_en.md new file mode 100644 index 00000000000000..4ddf07d3892d85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sent_mental_bert_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_mental_bert_pipeline pipeline BertSentenceEmbeddings from Zamoranesis +author: John Snow Labs +name: sent_mental_bert_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_mental_bert_pipeline` is a English model originally trained by Zamoranesis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_mental_bert_pipeline_en_5.5.0_3.0_1726801319792.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_mental_bert_pipeline_en_5.5.0_3.0_1726801319792.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_mental_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_mental_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_mental_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Zamoranesis/mental_bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sentiment_analysis_base_rslora_en.md b/docs/_posts/ahmedlone127/2024-09-20-sentiment_analysis_base_rslora_en.md new file mode 100644 index 00000000000000..f11791c9154561 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sentiment_analysis_base_rslora_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_base_rslora RoBertaForSequenceClassification from Shotaro30678 +author: John Snow Labs +name: sentiment_analysis_base_rslora +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_base_rslora` is a English model originally trained by Shotaro30678. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_base_rslora_en_5.5.0_3.0_1726851633427.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_base_rslora_en_5.5.0_3.0_1726851633427.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_analysis_base_rslora","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_analysis_base_rslora", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_base_rslora| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.8 MB| + +## References + +https://huggingface.co/Shotaro30678/sentiment-analysis-base-rslora \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-splade_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-splade_roberta_pipeline_en.md new file mode 100644 index 00000000000000..f783e56495e26b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-splade_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English splade_roberta_pipeline pipeline RoBertaEmbeddings from maximedb +author: John Snow Labs +name: splade_roberta_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`splade_roberta_pipeline` is a English model originally trained by maximedb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/splade_roberta_pipeline_en_5.5.0_3.0_1726796612748.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/splade_roberta_pipeline_en_5.5.0_3.0_1726796612748.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("splade_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("splade_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|splade_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/maximedb/splade-roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-stereotype_italian_it.md b/docs/_posts/ahmedlone127/2024-09-20-stereotype_italian_it.md new file mode 100644 index 00000000000000..aa2062f3c0f1f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-stereotype_italian_it.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Italian stereotype_italian BertForSequenceClassification from aequa-tech +author: John Snow Labs +name: stereotype_italian +date: 2024-09-20 +tags: [it, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stereotype_italian` is a Italian model originally trained by aequa-tech. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stereotype_italian_it_5.5.0_3.0_1726859952233.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stereotype_italian_it_5.5.0_3.0_1726859952233.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("stereotype_italian","it") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("stereotype_italian", "it") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stereotype_italian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|it| +|Size:|691.9 MB| + +## References + +https://huggingface.co/aequa-tech/stereotype-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-stress_mentalbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-stress_mentalbert_pipeline_en.md new file mode 100644 index 00000000000000..00500657e7ed4d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-stress_mentalbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stress_mentalbert_pipeline pipeline BertForSequenceClassification from tiya1012 +author: John Snow Labs +name: stress_mentalbert_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stress_mentalbert_pipeline` is a English model originally trained by tiya1012. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stress_mentalbert_pipeline_en_5.5.0_3.0_1726803968928.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stress_mentalbert_pipeline_en_5.5.0_3.0_1726803968928.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stress_mentalbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stress_mentalbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stress_mentalbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.9 MB| + +## References + +https://huggingface.co/tiya1012/stress_mentalbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-t_100002_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-t_100002_pipeline_en.md new file mode 100644 index 00000000000000..7cc87c45c76c60 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-t_100002_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English t_100002_pipeline pipeline RoBertaForSequenceClassification from Pablojmed +author: John Snow Labs +name: t_100002_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t_100002_pipeline` is a English model originally trained by Pablojmed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t_100002_pipeline_en_5.5.0_3.0_1726852221483.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t_100002_pipeline_en_5.5.0_3.0_1726852221483.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("t_100002_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("t_100002_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t_100002_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.2 MB| + +## References + +https://huggingface.co/Pablojmed/t_100002 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-test1_onlysmokehuazi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-test1_onlysmokehuazi_pipeline_en.md new file mode 100644 index 00000000000000..337443d4c0e1ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-test1_onlysmokehuazi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test1_onlysmokehuazi_pipeline pipeline DistilBertForSequenceClassification from Onlysmokehuazi +author: John Snow Labs +name: test1_onlysmokehuazi_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test1_onlysmokehuazi_pipeline` is a English model originally trained by Onlysmokehuazi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test1_onlysmokehuazi_pipeline_en_5.5.0_3.0_1726832519545.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test1_onlysmokehuazi_pipeline_en_5.5.0_3.0_1726832519545.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test1_onlysmokehuazi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test1_onlysmokehuazi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test1_onlysmokehuazi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Onlysmokehuazi/test1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-test2_en.md b/docs/_posts/ahmedlone127/2024-09-20-test2_en.md new file mode 100644 index 00000000000000..5f9702b3d0465c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-test2_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English test2 DistilBertForTokenClassification from yam1ke +author: John Snow Labs +name: test2 +date: 2024-09-20 +tags: [bert, en, open_source, token_classification, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test2` is a English model originally trained by yam1ke. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test2_en_5.5.0_3.0_1726812626002.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test2_en_5.5.0_3.0_1726812626002.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +tokenClassifier = DistilBertForTokenClassification.pretrained("test2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenClassifier]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val tokenClassifier = DistilBertForTokenClassification + .pretrained("test2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenClassifier)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +References + +https://huggingface.co/yam1ke/test2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-test_whisper_tiny_thai_kwanchiva_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-test_whisper_tiny_thai_kwanchiva_pipeline_en.md new file mode 100644 index 00000000000000..b8b8a012f10d8a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-test_whisper_tiny_thai_kwanchiva_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English test_whisper_tiny_thai_kwanchiva_pipeline pipeline WhisperForCTC from kwanchiva +author: John Snow Labs +name: test_whisper_tiny_thai_kwanchiva_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_whisper_tiny_thai_kwanchiva_pipeline` is a English model originally trained by kwanchiva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_whisper_tiny_thai_kwanchiva_pipeline_en_5.5.0_3.0_1726813885848.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_whisper_tiny_thai_kwanchiva_pipeline_en_5.5.0_3.0_1726813885848.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_whisper_tiny_thai_kwanchiva_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_whisper_tiny_thai_kwanchiva_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_whisper_tiny_thai_kwanchiva_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/kwanchiva/test-whisper-tiny-th + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-text_classification_100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-text_classification_100_pipeline_en.md new file mode 100644 index 00000000000000..8291bd82766d2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-text_classification_100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English text_classification_100_pipeline pipeline DistilBertForSequenceClassification from Neroism8422 +author: John Snow Labs +name: text_classification_100_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_classification_100_pipeline` is a English model originally trained by Neroism8422. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_classification_100_pipeline_en_5.5.0_3.0_1726841351309.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_classification_100_pipeline_en_5.5.0_3.0_1726841351309.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("text_classification_100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("text_classification_100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_classification_100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Neroism8422/Text_Classification_100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-text_clf_en.md b/docs/_posts/ahmedlone127/2024-09-20-text_clf_en.md new file mode 100644 index 00000000000000..cabf2b28816135 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-text_clf_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English text_clf DistilBertForSequenceClassification from SLKpnu +author: John Snow Labs +name: text_clf +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_clf` is a English model originally trained by SLKpnu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_clf_en_5.5.0_3.0_1726823798299.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_clf_en_5.5.0_3.0_1726823798299.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_clf","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_clf", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_clf| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/SLKpnu/text_clf \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-tmp_trainer_vicman229_en.md b/docs/_posts/ahmedlone127/2024-09-20-tmp_trainer_vicman229_en.md new file mode 100644 index 00000000000000..aebc6e649d4780 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-tmp_trainer_vicman229_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tmp_trainer_vicman229 DistilBertForSequenceClassification from Vicman229 +author: John Snow Labs +name: tmp_trainer_vicman229 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tmp_trainer_vicman229` is a English model originally trained by Vicman229. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tmp_trainer_vicman229_en_5.5.0_3.0_1726823748924.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tmp_trainer_vicman229_en_5.5.0_3.0_1726823748924.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("tmp_trainer_vicman229","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("tmp_trainer_vicman229", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tmp_trainer_vicman229| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Vicman229/tmp_trainer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-trained_sentiment_analyzer_ipex_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-trained_sentiment_analyzer_ipex_pipeline_en.md new file mode 100644 index 00000000000000..e923e74dd2245a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-trained_sentiment_analyzer_ipex_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trained_sentiment_analyzer_ipex_pipeline pipeline DistilBertForSequenceClassification from redbaron007 +author: John Snow Labs +name: trained_sentiment_analyzer_ipex_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trained_sentiment_analyzer_ipex_pipeline` is a English model originally trained by redbaron007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trained_sentiment_analyzer_ipex_pipeline_en_5.5.0_3.0_1726823732750.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trained_sentiment_analyzer_ipex_pipeline_en_5.5.0_3.0_1726823732750.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trained_sentiment_analyzer_ipex_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trained_sentiment_analyzer_ipex_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trained_sentiment_analyzer_ipex_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/redbaron007/trained-sentiment-analyzer-ipex + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-trainerh2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-trainerh2_pipeline_en.md new file mode 100644 index 00000000000000..6e62113d0b6002 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-trainerh2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trainerh2_pipeline pipeline DistilBertForSequenceClassification from SimoneJLaudani +author: John Snow Labs +name: trainerh2_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trainerh2_pipeline` is a English model originally trained by SimoneJLaudani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trainerh2_pipeline_en_5.5.0_3.0_1726848938623.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trainerh2_pipeline_en_5.5.0_3.0_1726848938623.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trainerh2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trainerh2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trainerh2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/SimoneJLaudani/trainerH2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-twitter_roberta_base_mar2021_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-twitter_roberta_base_mar2021_pipeline_en.md new file mode 100644 index 00000000000000..5e216cc7ccdd14 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-twitter_roberta_base_mar2021_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitter_roberta_base_mar2021_pipeline pipeline RoBertaEmbeddings from cardiffnlp +author: John Snow Labs +name: twitter_roberta_base_mar2021_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_mar2021_pipeline` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_mar2021_pipeline_en_5.5.0_3.0_1726816167077.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_mar2021_pipeline_en_5.5.0_3.0_1726816167077.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitter_roberta_base_mar2021_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitter_roberta_base_mar2021_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_mar2021_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/cardiffnlp/twitter-roberta-base-mar2021 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_42_en.md b/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_42_en.md new file mode 100644 index 00000000000000..65dc80fa079c8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_42_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English uned_tfg_08_42 RoBertaForSequenceClassification from alexisdr +author: John Snow Labs +name: uned_tfg_08_42 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`uned_tfg_08_42` is a English model originally trained by alexisdr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/uned_tfg_08_42_en_5.5.0_3.0_1726852227572.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/uned_tfg_08_42_en_5.5.0_3.0_1726852227572.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("uned_tfg_08_42","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("uned_tfg_08_42", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|uned_tfg_08_42| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|426.5 MB| + +## References + +https://huggingface.co/alexisdr/uned-tfg-08.42 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_56_en.md b/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_56_en.md new file mode 100644 index 00000000000000..fc0cc882419c4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_56_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English uned_tfg_08_56 RoBertaForSequenceClassification from alexisdr +author: John Snow Labs +name: uned_tfg_08_56 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`uned_tfg_08_56` is a English model originally trained by alexisdr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/uned_tfg_08_56_en_5.5.0_3.0_1726851486673.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/uned_tfg_08_56_en_5.5.0_3.0_1726851486673.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("uned_tfg_08_56","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("uned_tfg_08_56", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|uned_tfg_08_56| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|438.6 MB| + +## References + +https://huggingface.co/alexisdr/uned-tfg-08.56 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_64_mas_frecuentes_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_64_mas_frecuentes_pipeline_en.md new file mode 100644 index 00000000000000..c23f374e016c4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_64_mas_frecuentes_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English uned_tfg_08_64_mas_frecuentes_pipeline pipeline RoBertaForSequenceClassification from alexisdr +author: John Snow Labs +name: uned_tfg_08_64_mas_frecuentes_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`uned_tfg_08_64_mas_frecuentes_pipeline` is a English model originally trained by alexisdr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/uned_tfg_08_64_mas_frecuentes_pipeline_en_5.5.0_3.0_1726804387863.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/uned_tfg_08_64_mas_frecuentes_pipeline_en_5.5.0_3.0_1726804387863.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("uned_tfg_08_64_mas_frecuentes_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("uned_tfg_08_64_mas_frecuentes_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|uned_tfg_08_64_mas_frecuentes_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|429.9 MB| + +## References + +https://huggingface.co/alexisdr/uned-tfg-08.64_mas_frecuentes + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_base_pashto_ihanif_en.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_base_pashto_ihanif_en.md new file mode 100644 index 00000000000000..8720a87d41cc7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_base_pashto_ihanif_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_base_pashto_ihanif WhisperForCTC from ihanif +author: John Snow Labs +name: whisper_base_pashto_ihanif +date: 2024-09-20 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_pashto_ihanif` is a English model originally trained by ihanif. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_pashto_ihanif_en_5.5.0_3.0_1726810197147.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_pashto_ihanif_en_5.5.0_3.0_1726810197147.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_pashto_ihanif","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_pashto_ihanif", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_pashto_ihanif| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|643.9 MB| + +## References + +https://huggingface.co/ihanif/whisper-base-ps \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_small_hindi_ver2_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_hindi_ver2_pipeline_hi.md new file mode 100644 index 00000000000000..06d45fb0cdc2db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_hindi_ver2_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_small_hindi_ver2_pipeline pipeline WhisperForCTC from saxenagauravhf +author: John Snow Labs +name: whisper_small_hindi_ver2_pipeline +date: 2024-09-20 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_ver2_pipeline` is a Hindi model originally trained by saxenagauravhf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_ver2_pipeline_hi_5.5.0_3.0_1726814198361.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_ver2_pipeline_hi_5.5.0_3.0_1726814198361.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_ver2_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_ver2_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_ver2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/saxenagauravhf/whisper-small-hi-ver2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_small_indonesian_zeinhasan_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_indonesian_zeinhasan_pipeline_hi.md new file mode 100644 index 00000000000000..2a6d2112340887 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_indonesian_zeinhasan_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_small_indonesian_zeinhasan_pipeline pipeline WhisperForCTC from zeinhasan +author: John Snow Labs +name: whisper_small_indonesian_zeinhasan_pipeline +date: 2024-09-20 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_indonesian_zeinhasan_pipeline` is a Hindi model originally trained by zeinhasan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_zeinhasan_pipeline_hi_5.5.0_3.0_1726811972858.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_zeinhasan_pipeline_hi_5.5.0_3.0_1726811972858.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_indonesian_zeinhasan_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_indonesian_zeinhasan_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_indonesian_zeinhasan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|389.8 MB| + +## References + +https://huggingface.co/zeinhasan/whisper-small-id + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_telugu_en.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_telugu_en.md new file mode 100644 index 00000000000000..a4c3ffe14317f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_telugu_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_telugu WhisperForCTC from parambharat +author: John Snow Labs +name: whisper_tiny_telugu +date: 2024-09-20 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_telugu` is a English model originally trained by parambharat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_telugu_en_5.5.0_3.0_1726814263563.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_telugu_en_5.5.0_3.0_1726814263563.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_telugu","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_telugu", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_telugu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|391.1 MB| + +## References + +https://huggingface.co/parambharat/whisper-tiny-te \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_telugu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_telugu_pipeline_en.md new file mode 100644 index 00000000000000..c91330c59ba4de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_telugu_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_telugu_pipeline pipeline WhisperForCTC from parambharat +author: John Snow Labs +name: whisper_tiny_telugu_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_telugu_pipeline` is a English model originally trained by parambharat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_telugu_pipeline_en_5.5.0_3.0_1726814283408.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_telugu_pipeline_en_5.5.0_3.0_1726814283408.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_telugu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_telugu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_telugu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|391.1 MB| + +## References + +https://huggingface.co/parambharat/whisper-tiny-te + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_lee_soha_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_lee_soha_pipeline_en.md new file mode 100644 index 00000000000000..3aba9703a47ac7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_lee_soha_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_lee_soha_pipeline pipeline XlmRoBertaForTokenClassification from Lee-soha +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_lee_soha_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_lee_soha_pipeline` is a English model originally trained by Lee-soha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_lee_soha_pipeline_en_5.5.0_3.0_1726844590280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_lee_soha_pipeline_en_5.5.0_3.0_1726844590280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_lee_soha_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_lee_soha_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_lee_soha_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/Lee-soha/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_english_andrew45_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_english_andrew45_pipeline_en.md new file mode 100644 index 00000000000000..4835be958bd2d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_english_andrew45_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_andrew45_pipeline pipeline XlmRoBertaForTokenClassification from andrew45 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_andrew45_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_andrew45_pipeline` is a English model originally trained by andrew45. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_andrew45_pipeline_en_5.5.0_3.0_1726843489036.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_andrew45_pipeline_en_5.5.0_3.0_1726843489036.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_andrew45_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_andrew45_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_andrew45_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/andrew45/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_french_yurit04_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_french_yurit04_en.md new file mode 100644 index 00000000000000..12771ba64deb06 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_french_yurit04_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_yurit04 XlmRoBertaForTokenClassification from yurit04 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_yurit04 +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_yurit04` is a English model originally trained by yurit04. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_yurit04_en_5.5.0_3.0_1726844116316.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_yurit04_en_5.5.0_3.0_1726844116316.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_yurit04","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_yurit04", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_yurit04| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/yurit04/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_zardian_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_zardian_en.md new file mode 100644 index 00000000000000..cb62268e2f073c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_french_zardian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_zardian XlmRoBertaForTokenClassification from Zardian +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_zardian +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_zardian` is a English model originally trained by Zardian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_zardian_en_5.5.0_3.0_1726844250806.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_zardian_en_5.5.0_3.0_1726844250806.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_zardian","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_zardian", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_zardian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/Zardian/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_vantaa32_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_vantaa32_en.md new file mode 100644 index 00000000000000..f14fb49da26bd9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_german_vantaa32_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_vantaa32 XlmRoBertaForTokenClassification from vantaa32 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_vantaa32 +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_vantaa32` is a English model originally trained by vantaa32. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_vantaa32_en_5.5.0_3.0_1726843410192.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_vantaa32_en_5.5.0_3.0_1726843410192.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_vantaa32","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_vantaa32", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_vantaa32| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/vantaa32/xlm-roberta-base-finetuned_panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1_en.md new file mode 100644 index 00000000000000..e0f623d945a7e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1 XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1_en_5.5.0_3.0_1726800470246.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1_en_5.5.0_3.0_1726800470246.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_nepal_bhasa_vietnam_aug_delete_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|793.6 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-New_VietNam-aug_delete-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_pipeline_en.md new file mode 100644 index 00000000000000..f3f6efa0d678d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_pipeline_en_5.5.0_3.0_1726865361736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_pipeline_en_5.5.0_3.0_1726865361736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_tweet_sentiment_spanish_trimmed_spanish_15000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|359.2 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-tweet-sentiment-es-trimmed-es-15000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_emotion_spanish_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_emotion_spanish_pipeline_en.md new file mode 100644 index 00000000000000..ed1688cacd00af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_emotion_spanish_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_emotion_spanish_pipeline pipeline XlmRoBertaForSequenceClassification from Cesar42 +author: John Snow Labs +name: xlm_roberta_emotion_spanish_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_emotion_spanish_pipeline` is a English model originally trained by Cesar42. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_emotion_spanish_pipeline_en_5.5.0_3.0_1726865240242.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_emotion_spanish_pipeline_en_5.5.0_3.0_1726865240242.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_emotion_spanish_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_emotion_spanish_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_emotion_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Cesar42/xlm-roberta-emotion-es + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlmr_english_chinese_train_shuffled_1986_test2000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlmr_english_chinese_train_shuffled_1986_test2000_pipeline_en.md new file mode 100644 index 00000000000000..4d83cf1fae7add --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlmr_english_chinese_train_shuffled_1986_test2000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_english_chinese_train_shuffled_1986_test2000_pipeline pipeline XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_english_chinese_train_shuffled_1986_test2000_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_english_chinese_train_shuffled_1986_test2000_pipeline` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_english_chinese_train_shuffled_1986_test2000_pipeline_en_5.5.0_3.0_1726865900090.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_english_chinese_train_shuffled_1986_test2000_pipeline_en_5.5.0_3.0_1726865900090.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_english_chinese_train_shuffled_1986_test2000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_english_chinese_train_shuffled_1986_test2000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_english_chinese_train_shuffled_1986_test2000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|825.9 MB| + +## References + +https://huggingface.co/patpizio/xlmr-en-zh-train_shuffled-1986-test2000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlmr_sinhalese_english_all_shuffled_1986_test2000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlmr_sinhalese_english_all_shuffled_1986_test2000_pipeline_en.md new file mode 100644 index 00000000000000..e73f80a67361ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlmr_sinhalese_english_all_shuffled_1986_test2000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_sinhalese_english_all_shuffled_1986_test2000_pipeline pipeline XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_sinhalese_english_all_shuffled_1986_test2000_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_sinhalese_english_all_shuffled_1986_test2000_pipeline` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_sinhalese_english_all_shuffled_1986_test2000_pipeline_en_5.5.0_3.0_1726865046378.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_sinhalese_english_all_shuffled_1986_test2000_pipeline_en_5.5.0_3.0_1726865046378.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_sinhalese_english_all_shuffled_1986_test2000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_sinhalese_english_all_shuffled_1986_test2000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_sinhalese_english_all_shuffled_1986_test2000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.5 MB| + +## References + +https://huggingface.co/patpizio/xlmr-si-en-all_shuffled-1986-test2000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_french_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_french_pipeline_en.md new file mode 100644 index 00000000000000..afd53dbdffb43b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_french_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xnli_xlm_r_only_french_pipeline pipeline XlmRoBertaForSequenceClassification from semindan +author: John Snow Labs +name: xnli_xlm_r_only_french_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xnli_xlm_r_only_french_pipeline` is a English model originally trained by semindan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_french_pipeline_en_5.5.0_3.0_1726872765998.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_french_pipeline_en_5.5.0_3.0_1726872765998.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xnli_xlm_r_only_french_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xnli_xlm_r_only_french_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xnli_xlm_r_only_french_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|810.8 MB| + +## References + +https://huggingface.co/semindan/xnli_xlm_r_only_fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_hindi_en.md b/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_hindi_en.md new file mode 100644 index 00000000000000..d241590f3cca62 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_hindi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xnli_xlm_r_only_hindi XlmRoBertaForSequenceClassification from semindan +author: John Snow Labs +name: xnli_xlm_r_only_hindi +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xnli_xlm_r_only_hindi` is a English model originally trained by semindan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_hindi_en_5.5.0_3.0_1726845592925.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_hindi_en_5.5.0_3.0_1726845592925.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xnli_xlm_r_only_hindi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xnli_xlm_r_only_hindi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xnli_xlm_r_only_hindi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|803.2 MB| + +## References + +https://huggingface.co/semindan/xnli_xlm_r_only_hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-2020_q1_75p_filtered_random_en.md b/docs/_posts/ahmedlone127/2024-09-21-2020_q1_75p_filtered_random_en.md new file mode 100644 index 00000000000000..09f336254f305d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-2020_q1_75p_filtered_random_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2020_q1_75p_filtered_random RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q1_75p_filtered_random +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q1_75p_filtered_random` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q1_75p_filtered_random_en_5.5.0_3.0_1726958120107.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q1_75p_filtered_random_en_5.5.0_3.0_1726958120107.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("2020_q1_75p_filtered_random","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("2020_q1_75p_filtered_random","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q1_75p_filtered_random| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q1-75p-filtered-random \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-2020_q4_50p_filtered_random_en.md b/docs/_posts/ahmedlone127/2024-09-21-2020_q4_50p_filtered_random_en.md new file mode 100644 index 00000000000000..b0a918ce372a50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-2020_q4_50p_filtered_random_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2020_q4_50p_filtered_random RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q4_50p_filtered_random +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q4_50p_filtered_random` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q4_50p_filtered_random_en_5.5.0_3.0_1726957640736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q4_50p_filtered_random_en_5.5.0_3.0_1726957640736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("2020_q4_50p_filtered_random","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("2020_q4_50p_filtered_random","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q4_50p_filtered_random| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q4-50p-filtered-random \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-afriberta_base_finetuned_igbo_2e_4_en.md b/docs/_posts/ahmedlone127/2024-09-21-afriberta_base_finetuned_igbo_2e_4_en.md new file mode 100644 index 00000000000000..40c4a4226ae4e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-afriberta_base_finetuned_igbo_2e_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English afriberta_base_finetuned_igbo_2e_4 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afriberta_base_finetuned_igbo_2e_4 +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_base_finetuned_igbo_2e_4` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_base_finetuned_igbo_2e_4_en_5.5.0_3.0_1726883669364.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_base_finetuned_igbo_2e_4_en_5.5.0_3.0_1726883669364.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_base_finetuned_igbo_2e_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_base_finetuned_igbo_2e_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afriberta_base_finetuned_igbo_2e_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|415.2 MB| + +## References + +https://huggingface.co/grace-pro/afriberta-base-finetuned-igbo-2e-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-albert_model__29_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-albert_model__29_2_pipeline_en.md new file mode 100644 index 00000000000000..4f0dc48cf54c3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-albert_model__29_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English albert_model__29_2_pipeline pipeline DistilBertForSequenceClassification from KalaiselvanD +author: John Snow Labs +name: albert_model__29_2_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_model__29_2_pipeline` is a English model originally trained by KalaiselvanD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_model__29_2_pipeline_en_5.5.0_3.0_1726953718477.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_model__29_2_pipeline_en_5.5.0_3.0_1726953718477.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("albert_model__29_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("albert_model__29_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_model__29_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KalaiselvanD/albert_model__29_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-amazon_topical_chat_sentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-amazon_topical_chat_sentiment_pipeline_en.md new file mode 100644 index 00000000000000..b891dcfb0297bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-amazon_topical_chat_sentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English amazon_topical_chat_sentiment_pipeline pipeline DistilBertForSequenceClassification from reichenbach +author: John Snow Labs +name: amazon_topical_chat_sentiment_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amazon_topical_chat_sentiment_pipeline` is a English model originally trained by reichenbach. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amazon_topical_chat_sentiment_pipeline_en_5.5.0_3.0_1726889140922.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amazon_topical_chat_sentiment_pipeline_en_5.5.0_3.0_1726889140922.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("amazon_topical_chat_sentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("amazon_topical_chat_sentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amazon_topical_chat_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/reichenbach/amazon_topical_chat_sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-asr_iot_en.md b/docs/_posts/ahmedlone127/2024-09-21-asr_iot_en.md new file mode 100644 index 00000000000000..255bb3079b1907 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-asr_iot_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English asr_iot WhisperForCTC from Phong1807 +author: John Snow Labs +name: asr_iot +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`asr_iot` is a English model originally trained by Phong1807. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/asr_iot_en_5.5.0_3.0_1726910973206.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/asr_iot_en_5.5.0_3.0_1726910973206.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("asr_iot","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("asr_iot", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|asr_iot| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Phong1807/ASR_IoT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_pipeline_en.md new file mode 100644 index 00000000000000..5c87f448f139d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_pipeline pipeline BertForSequenceClassification from Saripudin +author: John Snow Labs +name: autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_pipeline` is a English model originally trained by Saripudin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_pipeline_en_5.5.0_3.0_1726956258247.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_pipeline_en_5.5.0_3.0_1726956258247.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_model_datasaur_nzfmymy2nzu_owy1njq2zjk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Saripudin/autotrain-model-datasaur-NzFmYmY2NzU-OWY1NjQ2Zjk + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-autotrain_tes2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-autotrain_tes2_pipeline_en.md new file mode 100644 index 00000000000000..9c61a635e10126 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-autotrain_tes2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autotrain_tes2_pipeline pipeline RoBertaForSequenceClassification from vuk123 +author: John Snow Labs +name: autotrain_tes2_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_tes2_pipeline` is a English model originally trained by vuk123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_tes2_pipeline_en_5.5.0_3.0_1726900098520.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_tes2_pipeline_en_5.5.0_3.0_1726900098520.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_tes2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_tes2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_tes2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|418.8 MB| + +## References + +https://huggingface.co/vuk123/autotrain-tes2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-banking77_distilbert_jidhu_mohan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-banking77_distilbert_jidhu_mohan_pipeline_en.md new file mode 100644 index 00000000000000..0b9f2326ccaffb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-banking77_distilbert_jidhu_mohan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English banking77_distilbert_jidhu_mohan_pipeline pipeline DistilBertForSequenceClassification from jidhu-mohan +author: John Snow Labs +name: banking77_distilbert_jidhu_mohan_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`banking77_distilbert_jidhu_mohan_pipeline` is a English model originally trained by jidhu-mohan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/banking77_distilbert_jidhu_mohan_pipeline_en_5.5.0_3.0_1726924197494.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/banking77_distilbert_jidhu_mohan_pipeline_en_5.5.0_3.0_1726924197494.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("banking77_distilbert_jidhu_mohan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("banking77_distilbert_jidhu_mohan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|banking77_distilbert_jidhu_mohan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/jidhu-mohan/banking77-distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_en.md new file mode 100644 index 00000000000000..00098ad5f2b962 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230 +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_en_5.5.0_3.0_1726946868480.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_en_5.5.0_3.0_1726946868480.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915010230 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_pipeline_en.md new file mode 100644 index 00000000000000..56481bdb323b03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_pipeline_en_5.5.0_3.0_1726946887770.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_pipeline_en_5.5.0_3.0_1726946887770.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915010230_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915010230 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818_en.md new file mode 100644 index 00000000000000..02877518c8d87e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818 +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818_en_5.5.0_3.0_1726929009669.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818_en_5.5.0_3.0_1726929009669.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915122818| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915122818 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_1_0_lr_0_0005_wd_0_01_dp_0_99_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_1_0_lr_0_0005_wd_0_01_dp_0_99_en.md new file mode 100644 index 00000000000000..0648e9b15d56af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_1_0_lr_0_0005_wd_0_01_dp_0_99_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_0_0005_wd_0_01_dp_0_99 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_0_0005_wd_0_01_dp_0_99 +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_0_0005_wd_0_01_dp_0_99` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_0_0005_wd_0_01_dp_0_99_en_5.5.0_3.0_1726946321521.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_0_0005_wd_0_01_dp_0_99_en_5.5.0_3.0_1726946321521.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_0_0005_wd_0_01_dp_0_99","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_0_0005_wd_0_01_dp_0_99", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_0_0005_wd_0_01_dp_0_99| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-0.0005-wd-0.01-dp-0.99 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500_en.md new file mode 100644 index 00000000000000..e9531520c2c4db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500 +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500_en_5.5.0_3.0_1726946619944.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500_en_5.5.0_3.0_1726946619944.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_87_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_500| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.87-lr-4e-07-wd-1e-05-dp-1.0-ss-0-st-False-fh-False-hs-500 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_pipeline_en.md new file mode 100644 index 00000000000000..3adae85d63cf54 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_pipeline_en_5.5.0_3.0_1726946565998.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_pipeline_en_5.5.0_3.0_1726946565998.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_001_dp_0_9_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-0.0001-wd-0.001-dp-0.9 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetuned_quac_1qahistory_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetuned_quac_1qahistory_en.md new file mode 100644 index 00000000000000..9a7c882faecc34 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetuned_quac_1qahistory_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_quac_1qahistory BertForQuestionAnswering from Jellevdl +author: John Snow Labs +name: bert_base_uncased_finetuned_quac_1qahistory +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_quac_1qahistory` is a English model originally trained by Jellevdl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_quac_1qahistory_en_5.5.0_3.0_1726946740836.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_quac_1qahistory_en_5.5.0_3.0_1726946740836.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetuned_quac_1qahistory","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetuned_quac_1qahistory", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_quac_1qahistory| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Jellevdl/bert-base-uncased-finetuned-quac-1QAhistory \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_squad_v1_finetuned_clickbait_detection_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_squad_v1_finetuned_clickbait_detection_en.md new file mode 100644 index 00000000000000..dd5582a6992810 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_squad_v1_finetuned_clickbait_detection_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_squad_v1_finetuned_clickbait_detection BertForQuestionAnswering from abdulmanaam +author: John Snow Labs +name: bert_base_uncased_squad_v1_finetuned_clickbait_detection +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_squad_v1_finetuned_clickbait_detection` is a English model originally trained by abdulmanaam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_squad_v1_finetuned_clickbait_detection_en_5.5.0_3.0_1726947064641.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_squad_v1_finetuned_clickbait_detection_en_5.5.0_3.0_1726947064641.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_squad_v1_finetuned_clickbait_detection","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_squad_v1_finetuned_clickbait_detection", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_squad_v1_finetuned_clickbait_detection| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/abdulmanaam/bert-base-uncased-squad-v1-finetuned-clickbait-detection \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_squad_v1_finetuned_clickbait_detection_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_squad_v1_finetuned_clickbait_detection_pipeline_en.md new file mode 100644 index 00000000000000..05c4b42c97c168 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_squad_v1_finetuned_clickbait_detection_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_squad_v1_finetuned_clickbait_detection_pipeline pipeline BertForQuestionAnswering from abdulmanaam +author: John Snow Labs +name: bert_base_uncased_squad_v1_finetuned_clickbait_detection_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_squad_v1_finetuned_clickbait_detection_pipeline` is a English model originally trained by abdulmanaam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_squad_v1_finetuned_clickbait_detection_pipeline_en_5.5.0_3.0_1726947083855.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_squad_v1_finetuned_clickbait_detection_pipeline_en_5.5.0_3.0_1726947083855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_squad_v1_finetuned_clickbait_detection_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_squad_v1_finetuned_clickbait_detection_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_squad_v1_finetuned_clickbait_detection_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/abdulmanaam/bert-base-uncased-squad-v1-finetuned-clickbait-detection + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_pipeline_en.md new file mode 100644 index 00000000000000..a3de40b9026012 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_pipeline pipeline BertForTokenClassification from ali2066 +author: John Snow Labs +name: bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_pipeline` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_pipeline_en_5.5.0_3.0_1726889361014.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_pipeline_en_5.5.0_3.0_1726889361014.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_token_itr0_0_0001_all_01_03_2022_04_48_27_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ali2066/bert-base-uncased_token_itr0_0.0001_all_01_03_2022-04_48_27 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_pipeline_en.md new file mode 100644 index 00000000000000..c1b3c588110dc6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_pipeline pipeline DistilBertForSequenceClassification from ArafatBHossain +author: John Snow Labs +name: bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_pipeline` is a English model originally trained by ArafatBHossain. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_pipeline_en_5.5.0_3.0_1726953068052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_pipeline_en_5.5.0_3.0_1726953068052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_distilled_multi_teacher_model_random_mind_epoch7_alpha0_8_refined_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ArafatBHossain/bert-distilled-multi_teacher_model_random_mind_epoch7_alpha0.8_refined + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_large_uncased_finetuned_squad_v2_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_large_uncased_finetuned_squad_v2_en.md new file mode 100644 index 00000000000000..8d2662726afb40 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_large_uncased_finetuned_squad_v2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_large_uncased_finetuned_squad_v2 BertForQuestionAnswering from ALOQAS +author: John Snow Labs +name: bert_large_uncased_finetuned_squad_v2 +date: 2024-09-21 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_finetuned_squad_v2` is a English model originally trained by ALOQAS. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_finetuned_squad_v2_en_5.5.0_3.0_1726946691672.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_finetuned_squad_v2_en_5.5.0_3.0_1726946691672.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_large_uncased_finetuned_squad_v2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_large_uncased_finetuned_squad_v2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_finetuned_squad_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ALOQAS/bert-large-uncased-finetuned-squad-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1_en.md new file mode 100644 index 00000000000000..7103cc148f18d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1_en_5.5.0_3.0_1726953548753.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1_en_5.5.0_3.0_1726953548753.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_vllm_gemma2b_llmoversight_0_5_nodropsus_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_vllm-gemma2b-llmOversight-0.5-noDropSus_1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-brwac_v1_3__checkpoint_last_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-brwac_v1_3__checkpoint_last_pipeline_en.md new file mode 100644 index 00000000000000..e2ed9d6b4ec186 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-brwac_v1_3__checkpoint_last_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English brwac_v1_3__checkpoint_last_pipeline pipeline RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: brwac_v1_3__checkpoint_last_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`brwac_v1_3__checkpoint_last_pipeline` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/brwac_v1_3__checkpoint_last_pipeline_en_5.5.0_3.0_1726943612362.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/brwac_v1_3__checkpoint_last_pipeline_en_5.5.0_3.0_1726943612362.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("brwac_v1_3__checkpoint_last_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("brwac_v1_3__checkpoint_last_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|brwac_v1_3__checkpoint_last_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|298.6 MB| + +## References + +https://huggingface.co/eduagarcia-temp/brwac_v1_3__checkpoint_last + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-buntesgelaber_en.md b/docs/_posts/ahmedlone127/2024-09-21-buntesgelaber_en.md new file mode 100644 index 00000000000000..0ed58ee40c125d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-buntesgelaber_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English buntesgelaber RoBertaEmbeddings from Janst1000 +author: John Snow Labs +name: buntesgelaber +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`buntesgelaber` is a English model originally trained by Janst1000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/buntesgelaber_en_5.5.0_3.0_1726934185614.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/buntesgelaber_en_5.5.0_3.0_1726934185614.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("buntesgelaber","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("buntesgelaber","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|buntesgelaber| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|312.0 MB| + +## References + +https://huggingface.co/Janst1000/buntesgelaber \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-buntesgelaber_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-buntesgelaber_pipeline_en.md new file mode 100644 index 00000000000000..eea55926ce3a3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-buntesgelaber_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English buntesgelaber_pipeline pipeline RoBertaEmbeddings from Janst1000 +author: John Snow Labs +name: buntesgelaber_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`buntesgelaber_pipeline` is a English model originally trained by Janst1000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/buntesgelaber_pipeline_en_5.5.0_3.0_1726934199700.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/buntesgelaber_pipeline_en_5.5.0_3.0_1726934199700.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("buntesgelaber_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("buntesgelaber_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|buntesgelaber_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|312.0 MB| + +## References + +https://huggingface.co/Janst1000/buntesgelaber + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_confunius_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_confunius_en.md new file mode 100644 index 00000000000000..5a4994a87f624b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_confunius_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_confunius RoBertaEmbeddings from confunius +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_confunius +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_confunius` is a English model originally trained by confunius. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_confunius_en_5.5.0_3.0_1726934698843.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_confunius_en_5.5.0_3.0_1726934698843.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_confunius","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_confunius","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_confunius| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/confunius/my_awesome_eli5_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_jaiiiiii_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_jaiiiiii_en.md new file mode 100644 index 00000000000000..6ea4faef930779 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_jaiiiiii_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_jaiiiiii RoBertaEmbeddings from Jaiiiiii +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_jaiiiiii +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_jaiiiiii` is a English model originally trained by Jaiiiiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_jaiiiiii_en_5.5.0_3.0_1726943731472.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_jaiiiiii_en_5.5.0_3.0_1726943731472.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_jaiiiiii","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_jaiiiiii","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_jaiiiiii| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.3 MB| + +## References + +https://huggingface.co/Jaiiiiii/my_awesome_eli5_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_vulture_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_vulture_en.md new file mode 100644 index 00000000000000..d8332a37060d06 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_vulture_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_vulture RoBertaEmbeddings from vulture +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_vulture +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_vulture` is a English model originally trained by vulture. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_vulture_en_5.5.0_3.0_1726934244530.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_vulture_en_5.5.0_3.0_1726934244530.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_vulture","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_vulture","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_vulture| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/vulture/my_awesome_eli5_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model2_amv146_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model2_amv146_en.md new file mode 100644 index 00000000000000..5fa2dd9591ed15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model2_amv146_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model2_amv146 DistilBertForSequenceClassification from amv146 +author: John Snow Labs +name: burmese_awesome_model2_amv146 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model2_amv146` is a English model originally trained by amv146. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model2_amv146_en_5.5.0_3.0_1726888930652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model2_amv146_en_5.5.0_3.0_1726888930652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model2_amv146","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model2_amv146", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model2_amv146| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/amv146/my_awesome_model2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_5cean_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_5cean_pipeline_en.md new file mode 100644 index 00000000000000..eef05a426534d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_5cean_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_5cean_pipeline pipeline DistilBertForSequenceClassification from 5cean +author: John Snow Labs +name: burmese_awesome_model_5cean_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_5cean_pipeline` is a English model originally trained by 5cean. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_5cean_pipeline_en_5.5.0_3.0_1726953474246.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_5cean_pipeline_en_5.5.0_3.0_1726953474246.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_5cean_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_5cean_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_5cean_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/5cean/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_riaraju_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_riaraju_pipeline_en.md new file mode 100644 index 00000000000000..5e2b82e6f64e8a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_riaraju_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_riaraju_pipeline pipeline DistilBertForSequenceClassification from riaraju +author: John Snow Labs +name: burmese_awesome_model_riaraju_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_riaraju_pipeline` is a English model originally trained by riaraju. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_riaraju_pipeline_en_5.5.0_3.0_1726884947900.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_riaraju_pipeline_en_5.5.0_3.0_1726884947900.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_riaraju_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_riaraju_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_riaraju_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/riaraju/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_tomchristensen474_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_tomchristensen474_pipeline_en.md new file mode 100644 index 00000000000000..bf8798cd9a8a32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_tomchristensen474_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_tomchristensen474_pipeline pipeline DistilBertForSequenceClassification from TomChristensen474 +author: John Snow Labs +name: burmese_awesome_model_tomchristensen474_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_tomchristensen474_pipeline` is a English model originally trained by TomChristensen474. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_tomchristensen474_pipeline_en_5.5.0_3.0_1726924016705.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_tomchristensen474_pipeline_en_5.5.0_3.0_1726924016705.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_tomchristensen474_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_tomchristensen474_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_tomchristensen474_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/TomChristensen474/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-cagbert_base_fl32_checkpoint_15852_de.md b/docs/_posts/ahmedlone127/2024-09-21-cagbert_base_fl32_checkpoint_15852_de.md new file mode 100644 index 00000000000000..dc66d32216a527 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-cagbert_base_fl32_checkpoint_15852_de.md @@ -0,0 +1,94 @@ +--- +layout: model +title: German cagbert_base_fl32_checkpoint_15852 BertForTokenClassification from MSey +author: John Snow Labs +name: cagbert_base_fl32_checkpoint_15852 +date: 2024-09-21 +tags: [de, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cagbert_base_fl32_checkpoint_15852` is a German model originally trained by MSey. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cagbert_base_fl32_checkpoint_15852_de_5.5.0_3.0_1726890042532.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cagbert_base_fl32_checkpoint_15852_de_5.5.0_3.0_1726890042532.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("cagbert_base_fl32_checkpoint_15852","de") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("cagbert_base_fl32_checkpoint_15852", "de") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cagbert_base_fl32_checkpoint_15852| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|409.8 MB| + +## References + +https://huggingface.co/MSey/CaGBERT-base_fl32_checkpoint-15852 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-cooking_info_need_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-21-cooking_info_need_classifier_en.md new file mode 100644 index 00000000000000..21bd477de37b4c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-cooking_info_need_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cooking_info_need_classifier BertForSequenceClassification from AlexFr +author: John Snow Labs +name: cooking_info_need_classifier +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cooking_info_need_classifier` is a English model originally trained by AlexFr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cooking_info_need_classifier_en_5.5.0_3.0_1726902517024.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cooking_info_need_classifier_en_5.5.0_3.0_1726902517024.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("cooking_info_need_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("cooking_info_need_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cooking_info_need_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/AlexFr/cooking-info-need-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-delivery_distilbert_base_uncased_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-delivery_distilbert_base_uncased_v1_pipeline_en.md new file mode 100644 index 00000000000000..53535223b93f76 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-delivery_distilbert_base_uncased_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English delivery_distilbert_base_uncased_v1_pipeline pipeline DistilBertForSequenceClassification from chuuhtetnaing +author: John Snow Labs +name: delivery_distilbert_base_uncased_v1_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`delivery_distilbert_base_uncased_v1_pipeline` is a English model originally trained by chuuhtetnaing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/delivery_distilbert_base_uncased_v1_pipeline_en_5.5.0_3.0_1726953028387.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/delivery_distilbert_base_uncased_v1_pipeline_en_5.5.0_3.0_1726953028387.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("delivery_distilbert_base_uncased_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("delivery_distilbert_base_uncased_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|delivery_distilbert_base_uncased_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chuuhtetnaing/delivery-distilbert-base-uncased-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-demo_whisper_en.md b/docs/_posts/ahmedlone127/2024-09-21-demo_whisper_en.md new file mode 100644 index 00000000000000..6b8a62467bb913 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-demo_whisper_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English demo_whisper WhisperForCTC from yash072 +author: John Snow Labs +name: demo_whisper +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`demo_whisper` is a English model originally trained by yash072. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/demo_whisper_en_5.5.0_3.0_1726877747985.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/demo_whisper_en_5.5.0_3.0_1726877747985.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("demo_whisper","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("demo_whisper", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|demo_whisper| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/yash072/demo_whisper \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_agnews_padding30model_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_agnews_padding30model_en.md new file mode 100644 index 00000000000000..bfeb7306b8e0f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_agnews_padding30model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_agnews_padding30model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_agnews_padding30model +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_agnews_padding30model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding30model_en_5.5.0_3.0_1726953317712.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding30model_en_5.5.0_3.0_1726953317712.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_agnews_padding30model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_agnews_padding30model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_agnews_padding30model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_agnews_padding30model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_multilingual_cased_finetuned_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_multilingual_cased_finetuned_pipeline_xx.md new file mode 100644 index 00000000000000..4b2b9c306db9e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_multilingual_cased_finetuned_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual distilbert_base_multilingual_cased_finetuned_pipeline pipeline DistilBertForSequenceClassification from Esmail275 +author: John Snow Labs +name: distilbert_base_multilingual_cased_finetuned_pipeline +date: 2024-09-21 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_multilingual_cased_finetuned_pipeline` is a Multilingual model originally trained by Esmail275. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_finetuned_pipeline_xx_5.5.0_3.0_1726924353140.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_finetuned_pipeline_xx_5.5.0_3.0_1726924353140.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_multilingual_cased_finetuned_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_multilingual_cased_finetuned_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_multilingual_cased_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|507.7 MB| + +## References + +https://huggingface.co/Esmail275/distilbert-base-multilingual-cased-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_en.md new file mode 100644 index 00000000000000..850d25e1390bde --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_en_5.5.0_3.0_1726923883088.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_en_5.5.0_3.0_1726923883088.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_pipeline_en.md new file mode 100644 index 00000000000000..11826b18e53b0f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_pipeline_en_5.5.0_3.0_1726923896313.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_pipeline_en_5.5.0_3.0_1726923896313.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_banking_zphr_0st72_ut72ut7_plain_simsp_clean + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_adl_hw1_penguinyeh_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_adl_hw1_penguinyeh_en.md new file mode 100644 index 00000000000000..5356cd54107596 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_adl_hw1_penguinyeh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_adl_hw1_penguinyeh DistilBertForSequenceClassification from penguinyeh +author: John Snow Labs +name: distilbert_base_uncased_finetuned_adl_hw1_penguinyeh +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_adl_hw1_penguinyeh` is a English model originally trained by penguinyeh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw1_penguinyeh_en_5.5.0_3.0_1726924361395.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw1_penguinyeh_en_5.5.0_3.0_1726924361395.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_adl_hw1_penguinyeh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_adl_hw1_penguinyeh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_adl_hw1_penguinyeh| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/penguinyeh/distilbert-base-uncased-finetuned-adl_hw1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_dasooo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_dasooo_pipeline_en.md new file mode 100644 index 00000000000000..0f94decb602274 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_dasooo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_dasooo_pipeline pipeline DistilBertForSequenceClassification from daSooo +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_dasooo_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_dasooo_pipeline` is a English model originally trained by daSooo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_dasooo_pipeline_en_5.5.0_3.0_1726888877540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_dasooo_pipeline_en_5.5.0_3.0_1726888877540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_dasooo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_dasooo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_dasooo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/daSooo/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_jiogenes_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_jiogenes_en.md new file mode 100644 index 00000000000000..f70af11c93993d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_jiogenes_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jiogenes DistilBertForSequenceClassification from jiogenes +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jiogenes +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jiogenes` is a English model originally trained by jiogenes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jiogenes_en_5.5.0_3.0_1726953188262.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jiogenes_en_5.5.0_3.0_1726953188262.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jiogenes","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jiogenes", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jiogenes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jiogenes/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_rorra_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_rorra_pipeline_en.md new file mode 100644 index 00000000000000..6e1fb494825e45 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_rorra_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_rorra_pipeline pipeline DistilBertForSequenceClassification from rorra +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_rorra_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_rorra_pipeline` is a English model originally trained by rorra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_rorra_pipeline_en_5.5.0_3.0_1726924069293.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_rorra_pipeline_en_5.5.0_3.0_1726924069293.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_rorra_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_rorra_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_rorra_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rorra/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_pipeline_en.md new file mode 100644 index 00000000000000..2dc6b6d373d870 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_pipeline pipeline DistilBertForSequenceClassification from Vicman229 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_pipeline` is a English model originally trained by Vicman229. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_pipeline_en_5.5.0_3.0_1726884681513.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_pipeline_en_5.5.0_3.0_1726884681513.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_sst_2_english_tuning_amazon_baby_5000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Vicman229/distilbert-base-uncased-finetuned-sst-2-english-tuning-amazon-baby-5000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_en.md new file mode 100644 index 00000000000000..9687a297d1e1dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_en_5.5.0_3.0_1726884786154.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_en_5.5.0_3.0_1726884786154.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1large10PfxNf_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp_en.md new file mode 100644 index 00000000000000..f398150cfcaf96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp_en_5.5.0_3.0_1726884765202.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp_en_5.5.0_3.0_1726884765202.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large53pfxnf_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1large53PfxNf_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_rile_v1_fully_frozen_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_rile_v1_fully_frozen_pipeline_en.md new file mode 100644 index 00000000000000..2ec578d671179f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_rile_v1_fully_frozen_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_rile_v1_fully_frozen_pipeline pipeline DistilBertForSequenceClassification from kghanlon +author: John Snow Labs +name: distilbert_base_uncased_rile_v1_fully_frozen_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_rile_v1_fully_frozen_pipeline` is a English model originally trained by kghanlon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_rile_v1_fully_frozen_pipeline_en_5.5.0_3.0_1726924448372.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_rile_v1_fully_frozen_pipeline_en_5.5.0_3.0_1726924448372.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_rile_v1_fully_frozen_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_rile_v1_fully_frozen_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_rile_v1_fully_frozen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kghanlon/distilbert-base-uncased-RILE-v1_fully_frozen + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_refine_cl_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_refine_cl_en.md new file mode 100644 index 00000000000000..2dc77db2b7d492 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_refine_cl_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_refine_cl DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_refine_cl +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_refine_cl` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_refine_cl_en_5.5.0_3.0_1726924300028.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_refine_cl_en_5.5.0_3.0_1726924300028.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_refine_cl","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_refine_cl", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_refine_cl| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_refine_cl \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_pipeline_en.md new file mode 100644 index 00000000000000..d6f94cf2a9ea29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_pipeline_en_5.5.0_3.0_1726952952240.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_pipeline_en_5.5.0_3.0_1726952952240.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut102ut1_plainprefix_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut102ut1_plainPrefix_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_en.md new file mode 100644 index 00000000000000..18c7e16d3d0789 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_en_5.5.0_3.0_1726884911510.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_en_5.5.0_3.0_1726884911510.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut72ut7_plain_sp_clean \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_pipeline_en.md new file mode 100644 index 00000000000000..2f1557953875cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_pipeline_en_5.5.0_3.0_1726884923526.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_pipeline_en_5.5.0_3.0_1726884923526.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut72ut7_plain_sp_clean_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut72ut7_plain_sp_clean + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_pipeline_en.md new file mode 100644 index 00000000000000..d6d1763e28cdac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_pipeline_en_5.5.0_3.0_1726889111995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_pipeline_en_5.5.0_3.0_1726889111995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_utility_zphr_0st_ut72ut7_plain_simsp_clean + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_food_vespinoza_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_food_vespinoza_pipeline_en.md new file mode 100644 index 00000000000000..974bdbcd582dc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_food_vespinoza_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_food_vespinoza_pipeline pipeline DistilBertForSequenceClassification from vespinoza +author: John Snow Labs +name: distilbert_food_vespinoza_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_food_vespinoza_pipeline` is a English model originally trained by vespinoza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_food_vespinoza_pipeline_en_5.5.0_3.0_1726924226174.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_food_vespinoza_pipeline_en_5.5.0_3.0_1726924226174.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_food_vespinoza_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_food_vespinoza_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_food_vespinoza_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/vespinoza/distilbert-food + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_mouse_enhancers_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_mouse_enhancers_en.md new file mode 100644 index 00000000000000..f3047b3678a5b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_mouse_enhancers_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_mouse_enhancers DistilBertForSequenceClassification from addykan +author: John Snow Labs +name: distilbert_mouse_enhancers +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_mouse_enhancers` is a English model originally trained by addykan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_mouse_enhancers_en_5.5.0_3.0_1726953290100.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_mouse_enhancers_en_5.5.0_3.0_1726953290100.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_mouse_enhancers","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_mouse_enhancers", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_mouse_enhancers| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/addykan/distilbert-mouse-enhancers \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_pipeline_en.md new file mode 100644 index 00000000000000..4241b24339e645 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_pipeline_en_5.5.0_3.0_1726953485436.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_pipeline_en_5.5.0_3.0_1726953485436.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_384_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|111.9 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_data_aug_cola_384 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_sql_timeout_classifier_with_trained_tokenizer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_sql_timeout_classifier_with_trained_tokenizer_pipeline_en.md new file mode 100644 index 00000000000000..9c99a0473a5dfd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_sql_timeout_classifier_with_trained_tokenizer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sql_timeout_classifier_with_trained_tokenizer_pipeline pipeline DistilBertForSequenceClassification from Lifehouse +author: John Snow Labs +name: distilbert_sql_timeout_classifier_with_trained_tokenizer_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sql_timeout_classifier_with_trained_tokenizer_pipeline` is a English model originally trained by Lifehouse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sql_timeout_classifier_with_trained_tokenizer_pipeline_en_5.5.0_3.0_1726923728799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sql_timeout_classifier_with_trained_tokenizer_pipeline_en_5.5.0_3.0_1726923728799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sql_timeout_classifier_with_trained_tokenizer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sql_timeout_classifier_with_trained_tokenizer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sql_timeout_classifier_with_trained_tokenizer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|267.0 MB| + +## References + +https://huggingface.co/Lifehouse/distilbert-sql-timeout-classifier-with-trained-tokenizer + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_twitterfin_padding100model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_twitterfin_padding100model_pipeline_en.md new file mode 100644 index 00000000000000..7e401e62da65b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_twitterfin_padding100model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_twitterfin_padding100model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_twitterfin_padding100model_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_twitterfin_padding100model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding100model_pipeline_en_5.5.0_3.0_1726953383698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding100model_pipeline_en_5.5.0_3.0_1726953383698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_twitterfin_padding100model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_twitterfin_padding100model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_twitterfin_padding100model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_twitterfin_padding100model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_twitterfin_padding10model_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_twitterfin_padding10model_en.md new file mode 100644 index 00000000000000..9cfa916c4f682a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_twitterfin_padding10model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_twitterfin_padding10model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_twitterfin_padding10model +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_twitterfin_padding10model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding10model_en_5.5.0_3.0_1726888596795.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding10model_en_5.5.0_3.0_1726888596795.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_twitterfin_padding10model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_twitterfin_padding10model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_twitterfin_padding10model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_twitterfin_padding10model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilft_frisian_1hd_fy.md b/docs/_posts/ahmedlone127/2024-09-21-distilft_frisian_1hd_fy.md new file mode 100644 index 00000000000000..de5221aa7c9385 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilft_frisian_1hd_fy.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Western Frisian distilft_frisian_1hd WhisperForCTC from Pageee +author: John Snow Labs +name: distilft_frisian_1hd +date: 2024-09-21 +tags: [fy, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: fy +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilft_frisian_1hd` is a Western Frisian model originally trained by Pageee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilft_frisian_1hd_fy_5.5.0_3.0_1726903833514.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilft_frisian_1hd_fy_5.5.0_3.0_1726903833514.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("distilft_frisian_1hd","fy") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("distilft_frisian_1hd", "fy") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilft_frisian_1hd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|fy| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Pageee/DistilFT-Frisian-1hd \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilft_frisian_1hd_pipeline_fy.md b/docs/_posts/ahmedlone127/2024-09-21-distilft_frisian_1hd_pipeline_fy.md new file mode 100644 index 00000000000000..87232b72e2e131 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilft_frisian_1hd_pipeline_fy.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Western Frisian distilft_frisian_1hd_pipeline pipeline WhisperForCTC from Pageee +author: John Snow Labs +name: distilft_frisian_1hd_pipeline +date: 2024-09-21 +tags: [fy, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: fy +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilft_frisian_1hd_pipeline` is a Western Frisian model originally trained by Pageee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilft_frisian_1hd_pipeline_fy_5.5.0_3.0_1726903893132.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilft_frisian_1hd_pipeline_fy_5.5.0_3.0_1726903893132.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilft_frisian_1hd_pipeline", lang = "fy") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilft_frisian_1hd_pipeline", lang = "fy") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilft_frisian_1hd_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fy| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Pageee/DistilFT-Frisian-1hd + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distill_pi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distill_pi_pipeline_en.md new file mode 100644 index 00000000000000..30205287c3c84a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distill_pi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distill_pi_pipeline pipeline DistilBertForSequenceClassification from dawidmt +author: John Snow Labs +name: distill_pi_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distill_pi_pipeline` is a English model originally trained by dawidmt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distill_pi_pipeline_en_5.5.0_3.0_1726953369769.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distill_pi_pipeline_en_5.5.0_3.0_1726953369769.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distill_pi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distill_pi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distill_pi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dawidmt/distill_pi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_finetuned_wikitext2_giannifiore_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_finetuned_wikitext2_giannifiore_pipeline_en.md new file mode 100644 index 00000000000000..c8056db63f4c14 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_finetuned_wikitext2_giannifiore_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_finetuned_wikitext2_giannifiore_pipeline pipeline RoBertaEmbeddings from giannifiore +author: John Snow Labs +name: distilroberta_base_finetuned_wikitext2_giannifiore_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_wikitext2_giannifiore_pipeline` is a English model originally trained by giannifiore. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_wikitext2_giannifiore_pipeline_en_5.5.0_3.0_1726942235990.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_wikitext2_giannifiore_pipeline_en_5.5.0_3.0_1726942235990.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_finetuned_wikitext2_giannifiore_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_finetuned_wikitext2_giannifiore_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_wikitext2_giannifiore_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/giannifiore/distilroberta-base-finetuned-wikitext2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_climateskeptics_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_climateskeptics_pipeline_en.md new file mode 100644 index 00000000000000..7919c4a3fe2447 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_climateskeptics_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_ft_climateskeptics_pipeline pipeline RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_climateskeptics_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_climateskeptics_pipeline` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_climateskeptics_pipeline_en_5.5.0_3.0_1726943719532.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_climateskeptics_pipeline_en_5.5.0_3.0_1726943719532.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_ft_climateskeptics_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_ft_climateskeptics_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_climateskeptics_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-climateskeptics + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_consoom_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_consoom_en.md new file mode 100644 index 00000000000000..3b2e1576afe683 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_ft_consoom_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_ft_consoom RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_consoom +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_consoom` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_consoom_en_5.5.0_3.0_1726943857114.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_consoom_en_5.5.0_3.0_1726943857114.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_consoom","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_consoom","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_consoom| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-consoom \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_test_pipeline_en.md new file mode 100644 index 00000000000000..2e6b2f024d2804 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_test_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_test_pipeline pipeline RoBertaForSequenceClassification from yu-jia-wang +author: John Snow Labs +name: distilroberta_base_test_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_test_pipeline` is a English model originally trained by yu-jia-wang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_test_pipeline_en_5.5.0_3.0_1726940689856.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_test_pipeline_en_5.5.0_3.0_1726940689856.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.5 MB| + +## References + +https://huggingface.co/yu-jia-wang/distilroberta-base-test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_topic_classification_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_topic_classification_3_pipeline_en.md new file mode 100644 index 00000000000000..f328a55821b159 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_topic_classification_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_topic_classification_3_pipeline pipeline RoBertaForSequenceClassification from abdulmatinomotoso +author: John Snow Labs +name: distilroberta_topic_classification_3_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_topic_classification_3_pipeline` is a English model originally trained by abdulmatinomotoso. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_topic_classification_3_pipeline_en_5.5.0_3.0_1726940706676.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_topic_classification_3_pipeline_en_5.5.0_3.0_1726940706676.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_topic_classification_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_topic_classification_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_topic_classification_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.4 MB| + +## References + +https://huggingface.co/abdulmatinomotoso/distilroberta-topic-classification_3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_pipeline_en.md new file mode 100644 index 00000000000000..a1f19f1f77af80 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1726900206240.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1726900206240.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random0_seed0_twitter_roberta_base_2022_154m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.4 MB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random0_seed0-twitter-roberta-base-2022-154m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_pipeline_en.md new file mode 100644 index 00000000000000..7d7fd69b21bf3e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_pipeline pipeline RoBertaForSequenceClassification from adnanakbr +author: John Snow Labs +name: emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_pipeline` is a English model originally trained by adnanakbr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_pipeline_en_5.5.0_3.0_1726940789123.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_pipeline_en_5.5.0_3.0_1726940789123.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotion_english_distilroberta_base_fine_tuned_for_amazon_english_reviews_on_200k_review_v3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.1 MB| + +## References + +https://huggingface.co/adnanakbr/emotion-english-distilroberta-base-fine_tuned_for_amazon_english_reviews_on_200K_review_v3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-enviroduediligence_lm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-enviroduediligence_lm_pipeline_en.md new file mode 100644 index 00000000000000..6b056952eb449e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-enviroduediligence_lm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English enviroduediligence_lm_pipeline pipeline RoBertaEmbeddings from d4data +author: John Snow Labs +name: enviroduediligence_lm_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`enviroduediligence_lm_pipeline` is a English model originally trained by d4data. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/enviroduediligence_lm_pipeline_en_5.5.0_3.0_1726944077556.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/enviroduediligence_lm_pipeline_en_5.5.0_3.0_1726944077556.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("enviroduediligence_lm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("enviroduediligence_lm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|enviroduediligence_lm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|310.4 MB| + +## References + +https://huggingface.co/d4data/EnviroDueDiligence_LM + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-fairlex_scotus_minilm_en.md b/docs/_posts/ahmedlone127/2024-09-21-fairlex_scotus_minilm_en.md new file mode 100644 index 00000000000000..054ab6d50de4c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-fairlex_scotus_minilm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fairlex_scotus_minilm RoBertaEmbeddings from coastalcph +author: John Snow Labs +name: fairlex_scotus_minilm +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fairlex_scotus_minilm` is a English model originally trained by coastalcph. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fairlex_scotus_minilm_en_5.5.0_3.0_1726934184452.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fairlex_scotus_minilm_en_5.5.0_3.0_1726934184452.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("fairlex_scotus_minilm","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("fairlex_scotus_minilm","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fairlex_scotus_minilm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|114.0 MB| + +## References + +https://huggingface.co/coastalcph/fairlex-scotus-minilm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-fairlex_scotus_minilm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-fairlex_scotus_minilm_pipeline_en.md new file mode 100644 index 00000000000000..5d92de2096fb97 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-fairlex_scotus_minilm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fairlex_scotus_minilm_pipeline pipeline RoBertaEmbeddings from coastalcph +author: John Snow Labs +name: fairlex_scotus_minilm_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fairlex_scotus_minilm_pipeline` is a English model originally trained by coastalcph. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fairlex_scotus_minilm_pipeline_en_5.5.0_3.0_1726934189708.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fairlex_scotus_minilm_pipeline_en_5.5.0_3.0_1726934189708.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fairlex_scotus_minilm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fairlex_scotus_minilm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fairlex_scotus_minilm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|114.0 MB| + +## References + +https://huggingface.co/coastalcph/fairlex-scotus-minilm + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-fine_tuned_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-fine_tuned_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..3016ff1f2591c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-fine_tuned_distilbert_pipeline_en.md @@ -0,0 +1,72 @@ +--- +layout: model +title: English fine_tuned_distilbert_pipeline pipeline DistilBertForQuestionAnswering from Roamify +author: John Snow Labs +name: fine_tuned_distilbert_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_distilbert_pipeline` is a English model originally trained by Roamify. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_distilbert_pipeline_en_5.5.0_3.0_1726924135235.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_distilbert_pipeline_en_5.5.0_3.0_1726924135235.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +pipeline = PretrainedPipeline("fine_tuned_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) +``` +```scala +val pipeline = new PretrainedPipeline("fine_tuned_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +References + +https://huggingface.co/Roamify/fine-tuned-distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_diegodelvalle_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_diegodelvalle_en.md new file mode 100644 index 00000000000000..b893041370da78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_diegodelvalle_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_diegodelvalle DistilBertForSequenceClassification from DiegodelValle +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_diegodelvalle +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_diegodelvalle` is a English model originally trained by DiegodelValle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_diegodelvalle_en_5.5.0_3.0_1726884672597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_diegodelvalle_en_5.5.0_3.0_1726884672597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_diegodelvalle","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_diegodelvalle", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_diegodelvalle| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DiegodelValle/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_ramabhishek_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_ramabhishek_en.md new file mode 100644 index 00000000000000..5d132c1b469abe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_ramabhishek_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ramabhishek DistilBertForSequenceClassification from RamAbhishek +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ramabhishek +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ramabhishek` is a English model originally trained by RamAbhishek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ramabhishek_en_5.5.0_3.0_1726953643510.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ramabhishek_en_5.5.0_3.0_1726953643510.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ramabhishek","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ramabhishek", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ramabhishek| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/RamAbhishek/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_ramabhishek_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_ramabhishek_pipeline_en.md new file mode 100644 index 00000000000000..731156cd834cc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_ramabhishek_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ramabhishek_pipeline pipeline DistilBertForSequenceClassification from RamAbhishek +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ramabhishek_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ramabhishek_pipeline` is a English model originally trained by RamAbhishek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ramabhishek_pipeline_en_5.5.0_3.0_1726953655537.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ramabhishek_pipeline_en_5.5.0_3.0_1726953655537.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_ramabhishek_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_ramabhishek_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ramabhishek_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/RamAbhishek/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-fintunned_v2_roberta_irish_en.md b/docs/_posts/ahmedlone127/2024-09-21-fintunned_v2_roberta_irish_en.md new file mode 100644 index 00000000000000..7a8272fb20276c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-fintunned_v2_roberta_irish_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fintunned_v2_roberta_irish RoBertaForSequenceClassification from nebiyu29 +author: John Snow Labs +name: fintunned_v2_roberta_irish +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fintunned_v2_roberta_irish` is a English model originally trained by nebiyu29. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fintunned_v2_roberta_irish_en_5.5.0_3.0_1726900225486.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fintunned_v2_roberta_irish_en_5.5.0_3.0_1726900225486.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("fintunned_v2_roberta_irish","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("fintunned_v2_roberta_irish", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fintunned_v2_roberta_irish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|435.3 MB| + +## References + +https://huggingface.co/nebiyu29/fintunned-v2-roberta_GA \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-ft_english_10maa_en.md b/docs/_posts/ahmedlone127/2024-09-21-ft_english_10maa_en.md new file mode 100644 index 00000000000000..c15a6c8cd22851 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-ft_english_10maa_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English ft_english_10maa WhisperForCTC from Pageee +author: John Snow Labs +name: ft_english_10maa +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ft_english_10maa` is a English model originally trained by Pageee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ft_english_10maa_en_5.5.0_3.0_1726912918803.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ft_english_10maa_en_5.5.0_3.0_1726912918803.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("ft_english_10maa","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("ft_english_10maa", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ft_english_10maa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Pageee/FT-English-10maa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-homework001_en.md b/docs/_posts/ahmedlone127/2024-09-21-homework001_en.md new file mode 100644 index 00000000000000..fbb998efcfbf5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-homework001_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English homework001 DistilBertForSequenceClassification from andyWuTw +author: John Snow Labs +name: homework001 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`homework001` is a English model originally trained by andyWuTw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/homework001_en_5.5.0_3.0_1726923742874.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/homework001_en_5.5.0_3.0_1726923742874.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("homework001","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("homework001", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|homework001| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/andyWuTw/homework001 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-inlegalbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-inlegalbert_pipeline_en.md new file mode 100644 index 00000000000000..4e98da507994ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-inlegalbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English inlegalbert_pipeline pipeline BertForSequenceClassification from xshubhamx +author: John Snow Labs +name: inlegalbert_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`inlegalbert_pipeline` is a English model originally trained by xshubhamx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/inlegalbert_pipeline_en_5.5.0_3.0_1726956490522.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/inlegalbert_pipeline_en_5.5.0_3.0_1726956490522.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("inlegalbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("inlegalbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|inlegalbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/xshubhamx/InLegalBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-malayalam_news_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-21-malayalam_news_classifier_en.md new file mode 100644 index 00000000000000..0bb462a45071cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-malayalam_news_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English malayalam_news_classifier XlmRoBertaForSequenceClassification from rajeshradhakrishnan +author: John Snow Labs +name: malayalam_news_classifier +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`malayalam_news_classifier` is a English model originally trained by rajeshradhakrishnan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/malayalam_news_classifier_en_5.5.0_3.0_1726918471323.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/malayalam_news_classifier_en_5.5.0_3.0_1726918471323.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("malayalam_news_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("malayalam_news_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|malayalam_news_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|777.4 MB| + +## References + +https://huggingface.co/rajeshradhakrishnan/malayalam_news_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103_en.md b/docs/_posts/ahmedlone127/2024-09-21-minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103_en.md new file mode 100644 index 00000000000000..6e9d4b94ff8b3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103 RoBertaEmbeddings from saghar +author: John Snow Labs +name: minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103 +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103` is a English model originally trained by saghar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103_en_5.5.0_3.0_1726943613547.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103_en_5.5.0_3.0_1726943613547.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|minilmv2_l6_h768_distilled_from_roberta_large_finetuned_wikitext103| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|196.0 MB| + +## References + +https://huggingface.co/saghar/MiniLMv2-L6-H768-distilled-from-RoBERTa-Large-finetuned-wikitext103 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-model_1_cannotbolt_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-model_1_cannotbolt_pipeline_en.md new file mode 100644 index 00000000000000..04929a6fd03590 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-model_1_cannotbolt_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_1_cannotbolt_pipeline pipeline BertForSequenceClassification from cannotbolt +author: John Snow Labs +name: model_1_cannotbolt_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_1_cannotbolt_pipeline` is a English model originally trained by cannotbolt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_1_cannotbolt_pipeline_en_5.5.0_3.0_1726956324979.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_1_cannotbolt_pipeline_en_5.5.0_3.0_1726956324979.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_1_cannotbolt_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_1_cannotbolt_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_1_cannotbolt_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/cannotbolt/model_1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-model_mental_williamywy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-model_mental_williamywy_pipeline_en.md new file mode 100644 index 00000000000000..95f71ad5eceb65 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-model_mental_williamywy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_mental_williamywy_pipeline pipeline DistilBertForSequenceClassification from WilliamYWY +author: John Snow Labs +name: model_mental_williamywy_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_mental_williamywy_pipeline` is a English model originally trained by WilliamYWY. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_mental_williamywy_pipeline_en_5.5.0_3.0_1726923790224.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_mental_williamywy_pipeline_en_5.5.0_3.0_1726923790224.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_mental_williamywy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_mental_williamywy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_mental_williamywy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/WilliamYWY/model_mental + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-n_distilbert_sst2_padding10model_realgon_en.md b/docs/_posts/ahmedlone127/2024-09-21-n_distilbert_sst2_padding10model_realgon_en.md new file mode 100644 index 00000000000000..9583ed4c2a64dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-n_distilbert_sst2_padding10model_realgon_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_distilbert_sst2_padding10model_realgon DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_sst2_padding10model_realgon +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_sst2_padding10model_realgon` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_sst2_padding10model_realgon_en_5.5.0_3.0_1726953628545.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_sst2_padding10model_realgon_en_5.5.0_3.0_1726953628545.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_sst2_padding10model_realgon","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_sst2_padding10model_realgon", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_sst2_padding10model_realgon| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_sst2_padding10model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-n_distilbert_sst2_padding10model_realgon_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-n_distilbert_sst2_padding10model_realgon_pipeline_en.md new file mode 100644 index 00000000000000..474b0daa1a4342 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-n_distilbert_sst2_padding10model_realgon_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_distilbert_sst2_padding10model_realgon_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_sst2_padding10model_realgon_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_sst2_padding10model_realgon_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_sst2_padding10model_realgon_pipeline_en_5.5.0_3.0_1726953640500.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_sst2_padding10model_realgon_pipeline_en_5.5.0_3.0_1726953640500.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_distilbert_sst2_padding10model_realgon_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_distilbert_sst2_padding10model_realgon_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_sst2_padding10model_realgon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_sst2_padding10model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-nlp_hf_workshop_amirai24_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-nlp_hf_workshop_amirai24_pipeline_en.md new file mode 100644 index 00000000000000..e58c225a8df420 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-nlp_hf_workshop_amirai24_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nlp_hf_workshop_amirai24_pipeline pipeline DistilBertForSequenceClassification from AmirAI24 +author: John Snow Labs +name: nlp_hf_workshop_amirai24_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_hf_workshop_amirai24_pipeline` is a English model originally trained by AmirAI24. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_hf_workshop_amirai24_pipeline_en_5.5.0_3.0_1726884839938.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_hf_workshop_amirai24_pipeline_en_5.5.0_3.0_1726884839938.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nlp_hf_workshop_amirai24_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nlp_hf_workshop_amirai24_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_hf_workshop_amirai24_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/AmirAI24/NLP_HF_Workshop + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-nlp_hf_workshop_mahdi_fathian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-nlp_hf_workshop_mahdi_fathian_pipeline_en.md new file mode 100644 index 00000000000000..872b4c6a7f2806 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-nlp_hf_workshop_mahdi_fathian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nlp_hf_workshop_mahdi_fathian_pipeline pipeline DistilBertForSequenceClassification from mahdi-fathian +author: John Snow Labs +name: nlp_hf_workshop_mahdi_fathian_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_hf_workshop_mahdi_fathian_pipeline` is a English model originally trained by mahdi-fathian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_hf_workshop_mahdi_fathian_pipeline_en_5.5.0_3.0_1726885029415.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_hf_workshop_mahdi_fathian_pipeline_en_5.5.0_3.0_1726885029415.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nlp_hf_workshop_mahdi_fathian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nlp_hf_workshop_mahdi_fathian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_hf_workshop_mahdi_fathian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/mahdi-fathian/NLP_HF_Workshop + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-openai_whisper_small_zoomrx_colab_1_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-21-openai_whisper_small_zoomrx_colab_1_pipeline_xx.md new file mode 100644 index 00000000000000..d548faf807f00d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-openai_whisper_small_zoomrx_colab_1_pipeline_xx.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Multilingual openai_whisper_small_zoomrx_colab_1_pipeline pipeline WhisperForCTC from PraveenJesu +author: John Snow Labs +name: openai_whisper_small_zoomrx_colab_1_pipeline +date: 2024-09-21 +tags: [xx, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`openai_whisper_small_zoomrx_colab_1_pipeline` is a Multilingual model originally trained by PraveenJesu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/openai_whisper_small_zoomrx_colab_1_pipeline_xx_5.5.0_3.0_1726939315984.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/openai_whisper_small_zoomrx_colab_1_pipeline_xx_5.5.0_3.0_1726939315984.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("openai_whisper_small_zoomrx_colab_1_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("openai_whisper_small_zoomrx_colab_1_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|openai_whisper_small_zoomrx_colab_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|1.1 GB| + +## References + +https://huggingface.co/PraveenJesu/openai-whisper-small-zoomrx-colab-1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-pretrained_distilroberta_on_ireland_tweets_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-pretrained_distilroberta_on_ireland_tweets_pipeline_en.md new file mode 100644 index 00000000000000..44b70ef05fb4cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-pretrained_distilroberta_on_ireland_tweets_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English pretrained_distilroberta_on_ireland_tweets_pipeline pipeline RoBertaEmbeddings from mitra-mir +author: John Snow Labs +name: pretrained_distilroberta_on_ireland_tweets_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pretrained_distilroberta_on_ireland_tweets_pipeline` is a English model originally trained by mitra-mir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pretrained_distilroberta_on_ireland_tweets_pipeline_en_5.5.0_3.0_1726934070122.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pretrained_distilroberta_on_ireland_tweets_pipeline_en_5.5.0_3.0_1726934070122.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pretrained_distilroberta_on_ireland_tweets_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pretrained_distilroberta_on_ireland_tweets_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pretrained_distilroberta_on_ireland_tweets_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.8 MB| + +## References + +https://huggingface.co/mitra-mir/pretrained-distilroberta-on-ireland-tweets + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-racism_finetuned_detests_wandb_en.md b/docs/_posts/ahmedlone127/2024-09-21-racism_finetuned_detests_wandb_en.md new file mode 100644 index 00000000000000..7aab138a6af585 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-racism_finetuned_detests_wandb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English racism_finetuned_detests_wandb RoBertaForSequenceClassification from Pablo94 +author: John Snow Labs +name: racism_finetuned_detests_wandb +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`racism_finetuned_detests_wandb` is a English model originally trained by Pablo94. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/racism_finetuned_detests_wandb_en_5.5.0_3.0_1726940903940.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/racism_finetuned_detests_wandb_en_5.5.0_3.0_1726940903940.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("racism_finetuned_detests_wandb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("racism_finetuned_detests_wandb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|racism_finetuned_detests_wandb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|449.6 MB| + +## References + +https://huggingface.co/Pablo94/racism-finetuned-detests-wandb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_pais_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_pais_pipeline_en.md new file mode 100644 index 00000000000000..62b7ebb9ddfc93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_pais_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_pais_pipeline pipeline RoBertaForSequenceClassification from vg055 +author: John Snow Labs +name: roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_pais_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_pais_pipeline` is a English model originally trained by vg055. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_pais_pipeline_en_5.5.0_3.0_1726940368627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_pais_pipeline_en_5.5.0_3.0_1726940368627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_pais_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_pais_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_pais_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/vg055/roberta-base-bne-finetuned-TripAdvisorDomainAdaptation-finetuned-e2-RestMex2023-pais + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_epoch_47_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_epoch_47_en.md new file mode 100644 index 00000000000000..f1c8e66fb3799b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_epoch_47_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_47 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_47 +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_47` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_47_en_5.5.0_3.0_1726957889283.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_47_en_5.5.0_3.0_1726957889283.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_47","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_47","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_47| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_47 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456_en.md new file mode 100644 index 00000000000000..2c543d6d8fe86f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456 RoBertaForTokenClassification from yokesh456 +author: John Snow Labs +name: roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456 +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456` is a English model originally trained by yokesh456. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456_en_5.5.0_3.0_1726926826311.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456_en_5.5.0_3.0_1726926826311.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_pm_m3_voc_hf_finetuned_ner_yokesh456| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/yokesh456/RoBERTa-large-PM-M3-Voc-hf-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_large_tweet_topic_multi_2020_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_tweet_topic_multi_2020_en.md new file mode 100644 index 00000000000000..b840a672133704 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_tweet_topic_multi_2020_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_tweet_topic_multi_2020 RoBertaForSequenceClassification from cardiffnlp +author: John Snow Labs +name: roberta_large_tweet_topic_multi_2020 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_tweet_topic_multi_2020` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_tweet_topic_multi_2020_en_5.5.0_3.0_1726940968585.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_tweet_topic_multi_2020_en_5.5.0_3.0_1726940968585.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_tweet_topic_multi_2020","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_tweet_topic_multi_2020", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_tweet_topic_multi_2020| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/cardiffnlp/roberta-large-tweet-topic-multi-2020 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_news_cnn_dailymail_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_news_cnn_dailymail_pipeline_en.md new file mode 100644 index 00000000000000..d8118bd9dfa950 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_news_cnn_dailymail_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_news_cnn_dailymail_pipeline pipeline RoBertaEmbeddings from isarth +author: John Snow Labs +name: roberta_news_cnn_dailymail_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_news_cnn_dailymail_pipeline` is a English model originally trained by isarth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_news_cnn_dailymail_pipeline_en_5.5.0_3.0_1726943474597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_news_cnn_dailymail_pipeline_en_5.5.0_3.0_1726943474597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_news_cnn_dailymail_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_news_cnn_dailymail_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_news_cnn_dailymail_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/isarth/roberta-news-cnn_dailymail + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_v2_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_v2_en.md new file mode 100644 index 00000000000000..913411eb008f9f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_v2 RoBertaForTokenClassification from token-classifier +author: John Snow Labs +name: roberta_v2 +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_v2` is a English model originally trained by token-classifier. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_v2_en_5.5.0_3.0_1726926649407.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_v2_en_5.5.0_3.0_1726926649407.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/token-classifier/roBERTa-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-robertabaseallenai_ppt_occitan_en.md b/docs/_posts/ahmedlone127/2024-09-21-robertabaseallenai_ppt_occitan_en.md new file mode 100644 index 00000000000000..17f15ae77dd2bf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-robertabaseallenai_ppt_occitan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robertabaseallenai_ppt_occitan RoBertaEmbeddings from mehrshadk +author: John Snow Labs +name: robertabaseallenai_ppt_occitan +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertabaseallenai_ppt_occitan` is a English model originally trained by mehrshadk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertabaseallenai_ppt_occitan_en_5.5.0_3.0_1726882217317.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertabaseallenai_ppt_occitan_en_5.5.0_3.0_1726882217317.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("robertabaseallenai_ppt_occitan","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("robertabaseallenai_ppt_occitan","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertabaseallenai_ppt_occitan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/mehrshadk/robertaBaseAllenAI_ppt_OC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-robit_roberta_base_italian_aida_upm_en.md b/docs/_posts/ahmedlone127/2024-09-21-robit_roberta_base_italian_aida_upm_en.md new file mode 100644 index 00000000000000..d5de39062f7e53 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-robit_roberta_base_italian_aida_upm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robit_roberta_base_italian_aida_upm RoBertaEmbeddings from AIDA-UPM +author: John Snow Labs +name: robit_roberta_base_italian_aida_upm +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robit_roberta_base_italian_aida_upm` is a English model originally trained by AIDA-UPM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robit_roberta_base_italian_aida_upm_en_5.5.0_3.0_1726882369761.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robit_roberta_base_italian_aida_upm_en_5.5.0_3.0_1726882369761.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("robit_roberta_base_italian_aida_upm","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("robit_roberta_base_italian_aida_upm","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robit_roberta_base_italian_aida_upm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.7 MB| + +## References + +https://huggingface.co/AIDA-UPM/robit-roberta-base-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-robit_roberta_base_italian_aida_upm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-robit_roberta_base_italian_aida_upm_pipeline_en.md new file mode 100644 index 00000000000000..1f3b01d192a27f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-robit_roberta_base_italian_aida_upm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robit_roberta_base_italian_aida_upm_pipeline pipeline RoBertaEmbeddings from AIDA-UPM +author: John Snow Labs +name: robit_roberta_base_italian_aida_upm_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robit_roberta_base_italian_aida_upm_pipeline` is a English model originally trained by AIDA-UPM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robit_roberta_base_italian_aida_upm_pipeline_en_5.5.0_3.0_1726882398728.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robit_roberta_base_italian_aida_upm_pipeline_en_5.5.0_3.0_1726882398728.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robit_roberta_base_italian_aida_upm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robit_roberta_base_italian_aida_upm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robit_roberta_base_italian_aida_upm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.7 MB| + +## References + +https://huggingface.co/AIDA-UPM/robit-roberta-base-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-saved_model_en.md b/docs/_posts/ahmedlone127/2024-09-21-saved_model_en.md new file mode 100644 index 00000000000000..6028522afbe64b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-saved_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English saved_model DistilBertForSequenceClassification from hanyp +author: John Snow Labs +name: saved_model +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`saved_model` is a English model originally trained by hanyp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/saved_model_en_5.5.0_3.0_1726888953733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/saved_model_en_5.5.0_3.0_1726888953733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("saved_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("saved_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|saved_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hanyp/saved_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v2_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v2_pipeline_ar.md new file mode 100644 index 00000000000000..f20f7e8fa2cb91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v2_pipeline_ar.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Arabic sent_arabertmo_base_v2_pipeline pipeline BertSentenceEmbeddings from Ebtihal +author: John Snow Labs +name: sent_arabertmo_base_v2_pipeline +date: 2024-09-21 +tags: [ar, open_source, pipeline, onnx] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_arabertmo_base_v2_pipeline` is a Arabic model originally trained by Ebtihal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v2_pipeline_ar_5.5.0_3.0_1726941903084.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v2_pipeline_ar_5.5.0_3.0_1726941903084.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_arabertmo_base_v2_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_arabertmo_base_v2_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_arabertmo_base_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|408.4 MB| + +## References + +https://huggingface.co/Ebtihal/AraBertMo_base_V2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v4_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v4_pipeline_ar.md new file mode 100644 index 00000000000000..b65ceb2ef5ff4d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v4_pipeline_ar.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Arabic sent_arabertmo_base_v4_pipeline pipeline BertSentenceEmbeddings from Ebtihal +author: John Snow Labs +name: sent_arabertmo_base_v4_pipeline +date: 2024-09-21 +tags: [ar, open_source, pipeline, onnx] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_arabertmo_base_v4_pipeline` is a Arabic model originally trained by Ebtihal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v4_pipeline_ar_5.5.0_3.0_1726913772927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v4_pipeline_ar_5.5.0_3.0_1726913772927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_arabertmo_base_v4_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_arabertmo_base_v4_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_arabertmo_base_v4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|408.4 MB| + +## References + +https://huggingface.co/Ebtihal/AraBertMo_base_V4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v7_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v7_pipeline_en.md new file mode 100644 index 00000000000000..ca18a4380deab2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v7_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_arabertmo_base_v7_pipeline pipeline BertSentenceEmbeddings from Ebtihal +author: John Snow Labs +name: sent_arabertmo_base_v7_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_arabertmo_base_v7_pipeline` is a English model originally trained by Ebtihal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v7_pipeline_en_5.5.0_3.0_1726913851033.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v7_pipeline_en_5.5.0_3.0_1726913851033.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_arabertmo_base_v7_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_arabertmo_base_v7_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_arabertmo_base_v7_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.4 MB| + +## References + +https://huggingface.co/Ebtihal/AraBertMo_base_V7 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_aristoberto_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_aristoberto_pipeline_en.md new file mode 100644 index 00000000000000..6675c94790a25e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_aristoberto_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_aristoberto_pipeline pipeline BertSentenceEmbeddings from Jacobo +author: John Snow Labs +name: sent_aristoberto_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_aristoberto_pipeline` is a English model originally trained by Jacobo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_aristoberto_pipeline_en_5.5.0_3.0_1726941699256.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_aristoberto_pipeline_en_5.5.0_3.0_1726941699256.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_aristoberto_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_aristoberto_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_aristoberto_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|420.6 MB| + +## References + +https://huggingface.co/Jacobo/aristoBERTo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_greek_modern_russian_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_greek_modern_russian_cased_pipeline_en.md new file mode 100644 index 00000000000000..1d0c4a4fa5642d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_english_greek_modern_russian_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_greek_modern_russian_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_greek_modern_russian_cased_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_greek_modern_russian_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_greek_modern_russian_cased_pipeline_en_5.5.0_3.0_1726941379148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_greek_modern_russian_cased_pipeline_en_5.5.0_3.0_1726941379148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_greek_modern_russian_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_greek_modern_russian_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_greek_modern_russian_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|433.7 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-el-ru-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_italian_cased_osiria_it.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_italian_cased_osiria_it.md new file mode 100644 index 00000000000000..602c47dd63b3c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_italian_cased_osiria_it.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Italian sent_bert_base_italian_cased_osiria BertSentenceEmbeddings from osiria +author: John Snow Labs +name: sent_bert_base_italian_cased_osiria +date: 2024-09-21 +tags: [it, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_italian_cased_osiria` is a Italian model originally trained by osiria. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_italian_cased_osiria_it_5.5.0_3.0_1726898297759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_italian_cased_osiria_it_5.5.0_3.0_1726898297759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_italian_cased_osiria","it") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_italian_cased_osiria","it") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_italian_cased_osiria| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|it| +|Size:|409.0 MB| + +## References + +https://huggingface.co/osiria/bert-base-italian-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_spanish_wwm_cased_finetuned_tweets_es.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_spanish_wwm_cased_finetuned_tweets_es.md new file mode 100644 index 00000000000000..18715288ec612e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_spanish_wwm_cased_finetuned_tweets_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish sent_bert_base_spanish_wwm_cased_finetuned_tweets BertSentenceEmbeddings from mariav +author: John Snow Labs +name: sent_bert_base_spanish_wwm_cased_finetuned_tweets +date: 2024-09-21 +tags: [es, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_spanish_wwm_cased_finetuned_tweets` is a Castilian, Spanish model originally trained by mariav. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_spanish_wwm_cased_finetuned_tweets_es_5.5.0_3.0_1726941870072.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_spanish_wwm_cased_finetuned_tweets_es_5.5.0_3.0_1726941870072.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_spanish_wwm_cased_finetuned_tweets","es") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_spanish_wwm_cased_finetuned_tweets","es") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_spanish_wwm_cased_finetuned_tweets| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|es| +|Size:|409.4 MB| + +## References + +https://huggingface.co/mariav/bert-base-spanish-wwm-cased-finetuned-tweets \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_issues_128_lijingxin_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_issues_128_lijingxin_pipeline_en.md new file mode 100644 index 00000000000000..45cbdeee945934 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_uncased_issues_128_lijingxin_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_lijingxin_pipeline pipeline BertSentenceEmbeddings from lijingxin +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_lijingxin_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_lijingxin_pipeline` is a English model originally trained by lijingxin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_lijingxin_pipeline_en_5.5.0_3.0_1726941450852.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_lijingxin_pipeline_en_5.5.0_3.0_1726941450852.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_lijingxin_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_lijingxin_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_lijingxin_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/lijingxin/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bertimbau_large_fine_tuned_sindhi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bertimbau_large_fine_tuned_sindhi_pipeline_en.md new file mode 100644 index 00000000000000..0d1f561199303f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bertimbau_large_fine_tuned_sindhi_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bertimbau_large_fine_tuned_sindhi_pipeline pipeline BertSentenceEmbeddings from AVSilva +author: John Snow Labs +name: sent_bertimbau_large_fine_tuned_sindhi_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertimbau_large_fine_tuned_sindhi_pipeline` is a English model originally trained by AVSilva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertimbau_large_fine_tuned_sindhi_pipeline_en_5.5.0_3.0_1726913757007.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertimbau_large_fine_tuned_sindhi_pipeline_en_5.5.0_3.0_1726913757007.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bertimbau_large_fine_tuned_sindhi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bertimbau_large_fine_tuned_sindhi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertimbau_large_fine_tuned_sindhi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/AVSilva/bertimbau-large-fine-tuned-sd + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_defsent_bert_base_uncased_mean_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_defsent_bert_base_uncased_mean_en.md new file mode 100644 index 00000000000000..2d73c9f996c0a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_defsent_bert_base_uncased_mean_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_defsent_bert_base_uncased_mean BertSentenceEmbeddings from cl-nagoya +author: John Snow Labs +name: sent_defsent_bert_base_uncased_mean +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_defsent_bert_base_uncased_mean` is a English model originally trained by cl-nagoya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_defsent_bert_base_uncased_mean_en_5.5.0_3.0_1726941302233.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_defsent_bert_base_uncased_mean_en_5.5.0_3.0_1726941302233.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_defsent_bert_base_uncased_mean","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_defsent_bert_base_uncased_mean","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_defsent_bert_base_uncased_mean| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/cl-nagoya/defsent-bert-base-uncased-mean \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_distilbertu_base_cased_0_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_distilbertu_base_cased_0_0_pipeline_en.md new file mode 100644 index 00000000000000..1068a27b7267e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_distilbertu_base_cased_0_0_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_distilbertu_base_cased_0_0_pipeline pipeline BertSentenceEmbeddings from amitness +author: John Snow Labs +name: sent_distilbertu_base_cased_0_0_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_distilbertu_base_cased_0_0_pipeline` is a English model originally trained by amitness. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_distilbertu_base_cased_0_0_pipeline_en_5.5.0_3.0_1726931774954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_distilbertu_base_cased_0_0_pipeline_en_5.5.0_3.0_1726931774954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_distilbertu_base_cased_0_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_distilbertu_base_cased_0_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_distilbertu_base_cased_0_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|471.0 MB| + +## References + +https://huggingface.co/amitness/distilbertu-base-cased-0.0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_hindi_bert_v1_hi.md b/docs/_posts/ahmedlone127/2024-09-21-sent_hindi_bert_v1_hi.md new file mode 100644 index 00000000000000..0146f37f92571e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_hindi_bert_v1_hi.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Hindi sent_hindi_bert_v1 BertSentenceEmbeddings from l3cube-pune +author: John Snow Labs +name: sent_hindi_bert_v1 +date: 2024-09-21 +tags: [hi, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_hindi_bert_v1` is a Hindi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_hindi_bert_v1_hi_5.5.0_3.0_1726941623089.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_hindi_bert_v1_hi_5.5.0_3.0_1726941623089.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_hindi_bert_v1","hi") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_hindi_bert_v1","hi") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_hindi_bert_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|hi| +|Size:|663.8 MB| + +## References + +https://huggingface.co/l3cube-pune/hindi-bert-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_logion_base_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_logion_base_en.md new file mode 100644 index 00000000000000..04a6f87beb8c5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_logion_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_logion_base BertSentenceEmbeddings from cabrooks +author: John Snow Labs +name: sent_logion_base +date: 2024-09-21 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_logion_base` is a English model originally trained by cabrooks. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_logion_base_en_5.5.0_3.0_1726941302826.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_logion_base_en_5.5.0_3.0_1726941302826.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_logion_base","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_logion_base","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_logion_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|420.8 MB| + +## References + +https://huggingface.co/cabrooks/LOGION-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_protaugment_lm_hwu64_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_protaugment_lm_hwu64_pipeline_en.md new file mode 100644 index 00000000000000..803b6ba8e8c739 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_protaugment_lm_hwu64_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_protaugment_lm_hwu64_pipeline pipeline BertSentenceEmbeddings from tdopierre +author: John Snow Labs +name: sent_protaugment_lm_hwu64_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_protaugment_lm_hwu64_pipeline` is a English model originally trained by tdopierre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_protaugment_lm_hwu64_pipeline_en_5.5.0_3.0_1726898750811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_protaugment_lm_hwu64_pipeline_en_5.5.0_3.0_1726898750811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_protaugment_lm_hwu64_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_protaugment_lm_hwu64_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_protaugment_lm_hwu64_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|404.1 MB| + +## References + +https://huggingface.co/tdopierre/ProtAugment-LM-HWU64 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-task_1_en.md b/docs/_posts/ahmedlone127/2024-09-21-task_1_en.md new file mode 100644 index 00000000000000..dc8db9954c45aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-task_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English task_1 DistilBertForSequenceClassification from lucando27 +author: John Snow Labs +name: task_1 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`task_1` is a English model originally trained by lucando27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/task_1_en_5.5.0_3.0_1726924071651.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/task_1_en_5.5.0_3.0_1726924071651.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("task_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("task_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|task_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lucando27/Task_1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-test_reward_model_en.md b/docs/_posts/ahmedlone127/2024-09-21-test_reward_model_en.md new file mode 100644 index 00000000000000..0e39b7b3a4fa56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-test_reward_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_reward_model RoBertaForSequenceClassification from Adzka +author: John Snow Labs +name: test_reward_model +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_reward_model` is a English model originally trained by Adzka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_reward_model_en_5.5.0_3.0_1726940467484.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_reward_model_en_5.5.0_3.0_1726940467484.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("test_reward_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("test_reward_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_reward_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|467.7 MB| + +## References + +https://huggingface.co/Adzka/test-reward-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-text_classifier_madanagrawal_en.md b/docs/_posts/ahmedlone127/2024-09-21-text_classifier_madanagrawal_en.md new file mode 100644 index 00000000000000..160c2ddacd9ad0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-text_classifier_madanagrawal_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English text_classifier_madanagrawal DistilBertForSequenceClassification from madanagrawal +author: John Snow Labs +name: text_classifier_madanagrawal +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_classifier_madanagrawal` is a English model originally trained by madanagrawal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_classifier_madanagrawal_en_5.5.0_3.0_1726953563903.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_classifier_madanagrawal_en_5.5.0_3.0_1726953563903.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_classifier_madanagrawal","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_classifier_madanagrawal", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_classifier_madanagrawal| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/madanagrawal/text_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-tunlangmodel_test1_10_ar.md b/docs/_posts/ahmedlone127/2024-09-21-tunlangmodel_test1_10_ar.md new file mode 100644 index 00000000000000..1efc8ca87f239e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-tunlangmodel_test1_10_ar.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Arabic tunlangmodel_test1_10 WhisperForCTC from Arbi-Houssem +author: John Snow Labs +name: tunlangmodel_test1_10 +date: 2024-09-21 +tags: [ar, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tunlangmodel_test1_10` is a Arabic model originally trained by Arbi-Houssem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tunlangmodel_test1_10_ar_5.5.0_3.0_1726950281584.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tunlangmodel_test1_10_ar_5.5.0_3.0_1726950281584.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("tunlangmodel_test1_10","ar") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("tunlangmodel_test1_10", "ar") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tunlangmodel_test1_10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Arbi-Houssem/TunLangModel_test1.10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-unibert_roberta_2_en.md b/docs/_posts/ahmedlone127/2024-09-21-unibert_roberta_2_en.md new file mode 100644 index 00000000000000..2a1faf0ec10e40 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-unibert_roberta_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English unibert_roberta_2 RoBertaForTokenClassification from dbala02 +author: John Snow Labs +name: unibert_roberta_2 +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`unibert_roberta_2` is a English model originally trained by dbala02. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/unibert_roberta_2_en_5.5.0_3.0_1726926893606.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/unibert_roberta_2_en_5.5.0_3.0_1726926893606.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("unibert_roberta_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("unibert_roberta_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|unibert_roberta_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|422.7 MB| + +## References + +https://huggingface.co/dbala02/uniBERT.RoBERTa.2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_catalan_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_catalan_en.md new file mode 100644 index 00000000000000..059ff1711b80e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_catalan_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_base_catalan WhisperForCTC from softcatala +author: John Snow Labs +name: whisper_base_catalan +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_catalan` is a English model originally trained by softcatala. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_catalan_en_5.5.0_3.0_1726878181232.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_catalan_en_5.5.0_3.0_1726878181232.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_catalan","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_catalan", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_catalan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|641.5 MB| + +## References + +https://huggingface.co/softcatala/whisper-base-ca \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_chinese_cn_cv9_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_chinese_cn_cv9_en.md new file mode 100644 index 00000000000000..8fbb4005a958a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_chinese_cn_cv9_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_base_chinese_cn_cv9 WhisperForCTC from Hydrodynamical +author: John Snow Labs +name: whisper_base_chinese_cn_cv9 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_chinese_cn_cv9` is a English model originally trained by Hydrodynamical. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_chinese_cn_cv9_en_5.5.0_3.0_1726891248883.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_chinese_cn_cv9_en_5.5.0_3.0_1726891248883.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_chinese_cn_cv9","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_chinese_cn_cv9", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_chinese_cn_cv9| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|643.6 MB| + +## References + +https://huggingface.co/Hydrodynamical/whisper-base-zh-CN-cv9 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_full_data_v2_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_full_data_v2_en.md new file mode 100644 index 00000000000000..4cbe2f36649fbf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_full_data_v2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_base_full_data_v2 WhisperForCTC from pphuc25 +author: John Snow Labs +name: whisper_base_full_data_v2 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_full_data_v2` is a English model originally trained by pphuc25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_full_data_v2_en_5.5.0_3.0_1726960098733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_full_data_v2_en_5.5.0_3.0_1726960098733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_full_data_v2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_full_data_v2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_full_data_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|642.1 MB| + +## References + +https://huggingface.co/pphuc25/whisper-base-full-data-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_german_cv15_v1_de.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_german_cv15_v1_de.md new file mode 100644 index 00000000000000..11b0e71b3a4c82 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_german_cv15_v1_de.md @@ -0,0 +1,84 @@ +--- +layout: model +title: German whisper_base_german_cv15_v1 WhisperForCTC from flozi00 +author: John Snow Labs +name: whisper_base_german_cv15_v1 +date: 2024-09-21 +tags: [de, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_german_cv15_v1` is a German model originally trained by flozi00. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_german_cv15_v1_de_5.5.0_3.0_1726907361355.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_german_cv15_v1_de_5.5.0_3.0_1726907361355.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_german_cv15_v1","de") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_german_cv15_v1", "de") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_german_cv15_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|de| +|Size:|398.3 MB| + +## References + +https://huggingface.co/flozi00/whisper-base-german-cv15-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_korean_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_korean_pipeline_en.md new file mode 100644 index 00000000000000..fdd522a9a2c1fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_korean_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_korean_pipeline pipeline WhisperForCTC from SungBeom +author: John Snow Labs +name: whisper_base_korean_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_korean_pipeline` is a English model originally trained by SungBeom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_korean_pipeline_en_5.5.0_3.0_1726894864246.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_korean_pipeline_en_5.5.0_3.0_1726894864246.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_korean_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_korean_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_korean_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|642.2 MB| + +## References + +https://huggingface.co/SungBeom/whisper-base-ko + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_lyric_transcription_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_lyric_transcription_pipeline_en.md new file mode 100644 index 00000000000000..f9e539105d552c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_lyric_transcription_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_lyric_transcription_pipeline pipeline WhisperForCTC from peterjwms +author: John Snow Labs +name: whisper_base_lyric_transcription_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_lyric_transcription_pipeline` is a English model originally trained by peterjwms. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_lyric_transcription_pipeline_en_5.5.0_3.0_1726893779769.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_lyric_transcription_pipeline_en_5.5.0_3.0_1726893779769.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_lyric_transcription_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_lyric_transcription_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_lyric_transcription_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|646.7 MB| + +## References + +https://huggingface.co/peterjwms/whisper-base-lyric-transcription + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_fine_tuned_base_company_earnings_call_v0_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_fine_tuned_base_company_earnings_call_v0_en.md new file mode 100644 index 00000000000000..fb8cca3d5dbae1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_fine_tuned_base_company_earnings_call_v0_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_fine_tuned_base_company_earnings_call_v0 WhisperForCTC from MasatoShima1618 +author: John Snow Labs +name: whisper_fine_tuned_base_company_earnings_call_v0 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_fine_tuned_base_company_earnings_call_v0` is a English model originally trained by MasatoShima1618. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_fine_tuned_base_company_earnings_call_v0_en_5.5.0_3.0_1726962066021.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_fine_tuned_base_company_earnings_call_v0_en_5.5.0_3.0_1726962066021.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_fine_tuned_base_company_earnings_call_v0","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_fine_tuned_base_company_earnings_call_v0", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_fine_tuned_base_company_earnings_call_v0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|642.5 MB| + +## References + +https://huggingface.co/MasatoShima1618/Whisper-fine-tuned-base-company-earnings-call-v0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_afrikaans_za_abhinay45_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_afrikaans_za_abhinay45_en.md new file mode 100644 index 00000000000000..a56fec95c734f3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_afrikaans_za_abhinay45_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_afrikaans_za_abhinay45 WhisperForCTC from Abhinay45 +author: John Snow Labs +name: whisper_small_afrikaans_za_abhinay45 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_afrikaans_za_abhinay45` is a English model originally trained by Abhinay45. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_afrikaans_za_abhinay45_en_5.5.0_3.0_1726949271124.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_afrikaans_za_abhinay45_en_5.5.0_3.0_1726949271124.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_afrikaans_za_abhinay45","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_afrikaans_za_abhinay45", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_afrikaans_za_abhinay45| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Abhinay45/whisper-small-af-ZA \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_arabic_test_draligomaa_dataset_ar.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_arabic_test_draligomaa_dataset_ar.md new file mode 100644 index 00000000000000..1e2f91cd5564fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_arabic_test_draligomaa_dataset_ar.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Arabic whisper_small_arabic_test_draligomaa_dataset WhisperForCTC from DrAliGomaa +author: John Snow Labs +name: whisper_small_arabic_test_draligomaa_dataset +date: 2024-09-21 +tags: [ar, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_test_draligomaa_dataset` is a Arabic model originally trained by DrAliGomaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_test_draligomaa_dataset_ar_5.5.0_3.0_1726936405461.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_test_draligomaa_dataset_ar_5.5.0_3.0_1726936405461.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_arabic_test_draligomaa_dataset","ar") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_arabic_test_draligomaa_dataset", "ar") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_test_draligomaa_dataset| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/DrAliGomaa/whisper-small-ar-test-draligomaa-dataset \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_bengali_our_dataset_grapheme_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_bengali_our_dataset_grapheme_en.md new file mode 100644 index 00000000000000..5528b3ec911e4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_bengali_our_dataset_grapheme_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_bengali_our_dataset_grapheme WhisperForCTC from AIFahim +author: John Snow Labs +name: whisper_small_bengali_our_dataset_grapheme +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_bengali_our_dataset_grapheme` is a English model originally trained by AIFahim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_bengali_our_dataset_grapheme_en_5.5.0_3.0_1726939473804.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_bengali_our_dataset_grapheme_en_5.5.0_3.0_1726939473804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_bengali_our_dataset_grapheme","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_bengali_our_dataset_grapheme", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_bengali_our_dataset_grapheme| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/AIFahim/whisper-small-bn_our_dataset_grapheme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_bengali_shamik_bn.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_bengali_shamik_bn.md new file mode 100644 index 00000000000000..30389fa33e77e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_bengali_shamik_bn.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Bengali whisper_small_bengali_shamik WhisperForCTC from Shamik +author: John Snow Labs +name: whisper_small_bengali_shamik +date: 2024-09-21 +tags: [bn, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_bengali_shamik` is a Bengali model originally trained by Shamik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_bengali_shamik_bn_5.5.0_3.0_1726905515902.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_bengali_shamik_bn_5.5.0_3.0_1726905515902.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_bengali_shamik","bn") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_bengali_shamik", "bn") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_bengali_shamik| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|bn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Shamik/whisper-small-bn \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_best_de.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_best_de.md new file mode 100644 index 00000000000000..d036b0f528c347 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_best_de.md @@ -0,0 +1,84 @@ +--- +layout: model +title: German whisper_small_best WhisperForCTC from marccgrau +author: John Snow Labs +name: whisper_small_best +date: 2024-09-21 +tags: [de, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_best` is a German model originally trained by marccgrau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_best_de_5.5.0_3.0_1726904911476.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_best_de_5.5.0_3.0_1726904911476.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_best","de") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_best", "de") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_best| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|de| +|Size:|1.7 GB| + +## References + +https://huggingface.co/marccgrau/whisper-small-best \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_cn_patrickml_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_cn_patrickml_en.md new file mode 100644 index 00000000000000..4a10f75a9addd4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_cn_patrickml_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_cn_patrickml WhisperForCTC from PatrickML +author: John Snow Labs +name: whisper_small_cn_patrickml +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_cn_patrickml` is a English model originally trained by PatrickML. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_cn_patrickml_en_5.5.0_3.0_1726910991376.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_cn_patrickml_en_5.5.0_3.0_1726910991376.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_cn_patrickml","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_cn_patrickml", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_cn_patrickml| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/PatrickML/whisper-small-CN \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_english_mskov_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_english_mskov_pipeline_en.md new file mode 100644 index 00000000000000..ce5f251e40a532 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_english_mskov_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_english_mskov_pipeline pipeline WhisperForCTC from mskov +author: John Snow Labs +name: whisper_small_english_mskov_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_english_mskov_pipeline` is a English model originally trained by mskov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_english_mskov_pipeline_en_5.5.0_3.0_1726950224671.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_english_mskov_pipeline_en_5.5.0_3.0_1726950224671.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_english_mskov_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_english_mskov_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_english_mskov_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.1 GB| + +## References + +https://huggingface.co/mskov/whisper-small.en + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_ger_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_ger_en.md new file mode 100644 index 00000000000000..71dc2cdc04f3c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_ger_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_ger WhisperForCTC from daniel123321 +author: John Snow Labs +name: whisper_small_ger +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_ger` is a English model originally trained by daniel123321. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_ger_en_5.5.0_3.0_1726894944274.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_ger_en_5.5.0_3.0_1726894944274.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_ger","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_ger", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_ger| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/daniel123321/whisper-small-ger \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hre6_nu_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hre6_nu_en.md new file mode 100644 index 00000000000000..ca1129552d45a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hre6_nu_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_hre6_nu WhisperForCTC from ntviet +author: John Snow Labs +name: whisper_small_hre6_nu +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hre6_nu` is a English model originally trained by ntviet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hre6_nu_en_5.5.0_3.0_1726910973262.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hre6_nu_en_5.5.0_3.0_1726910973262.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hre6_nu","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hre6_nu", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hre6_nu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ntviet/whisper-small-hre6-nu \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_korean_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_korean_test_pipeline_en.md new file mode 100644 index 00000000000000..1df7e27b76c8c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_korean_test_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_korean_test_pipeline pipeline WhisperForCTC from Rifky +author: John Snow Labs +name: whisper_small_korean_test_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_korean_test_pipeline` is a English model originally trained by Rifky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_korean_test_pipeline_en_5.5.0_3.0_1726949662917.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_korean_test_pipeline_en_5.5.0_3.0_1726949662917.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_korean_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_korean_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_korean_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Rifky/whisper-small-ko-test + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_kurdish_ckb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_kurdish_ckb_pipeline_en.md new file mode 100644 index 00000000000000..4073a5537b2f32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_kurdish_ckb_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_kurdish_ckb_pipeline pipeline WhisperForCTC from roshna-omer +author: John Snow Labs +name: whisper_small_kurdish_ckb_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_kurdish_ckb_pipeline` is a English model originally trained by roshna-omer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_kurdish_ckb_pipeline_en_5.5.0_3.0_1726950220134.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_kurdish_ckb_pipeline_en_5.5.0_3.0_1726950220134.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_kurdish_ckb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_kurdish_ckb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_kurdish_ckb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/roshna-omer/whisper-small-ku-ckb + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_oriya_v1_3_or.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_oriya_v1_3_or.md new file mode 100644 index 00000000000000..348d8625de6f92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_oriya_v1_3_or.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Oriya (macrolanguage) whisper_small_oriya_v1_3 WhisperForCTC from chandan3007 +author: John Snow Labs +name: whisper_small_oriya_v1_3 +date: 2024-09-21 +tags: [or, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: or +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_oriya_v1_3` is a Oriya (macrolanguage) model originally trained by chandan3007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_oriya_v1_3_or_5.5.0_3.0_1726937284746.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_oriya_v1_3_or_5.5.0_3.0_1726937284746.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_oriya_v1_3","or") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_oriya_v1_3", "or") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_oriya_v1_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|or| +|Size:|1.7 GB| + +## References + +https://huggingface.co/chandan3007/whisper-small-oriya-v1.3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_pipeline_dv.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_pipeline_dv.md new file mode 100644 index 00000000000000..fb1efb04f4f85d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_pipeline_dv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_pipeline pipeline WhisperForCTC from ClementXie +author: John Snow Labs +name: whisper_small_pipeline +date: 2024-09-21 +tags: [dv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_pipeline` is a Dhivehi, Divehi, Maldivian model originally trained by ClementXie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_pipeline_dv_5.5.0_3.0_1726950865208.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_pipeline_dv_5.5.0_3.0_1726950865208.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_pipeline", lang = "dv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_pipeline", lang = "dv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ClementXie/whisper-small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_portuguese_1_pt.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_portuguese_1_pt.md new file mode 100644 index 00000000000000..c46eddca2eab8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_portuguese_1_pt.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Portuguese whisper_small_portuguese_1 WhisperForCTC from Berly00 +author: John Snow Labs +name: whisper_small_portuguese_1 +date: 2024-09-21 +tags: [pt, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_portuguese_1` is a Portuguese model originally trained by Berly00. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_1_pt_5.5.0_3.0_1726939074785.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_1_pt_5.5.0_3.0_1726939074785.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_portuguese_1","pt") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_portuguese_1", "pt") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_portuguese_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|pt| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Berly00/whisper-small-portuguese-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_rixvox_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_rixvox_pipeline_en.md new file mode 100644 index 00000000000000..522b2a853e11fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_rixvox_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_rixvox_pipeline pipeline WhisperForCTC from KBLab +author: John Snow Labs +name: whisper_small_rixvox_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_rixvox_pipeline` is a English model originally trained by KBLab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_rixvox_pipeline_en_5.5.0_3.0_1726893336587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_rixvox_pipeline_en_5.5.0_3.0_1726893336587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_rixvox_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_rixvox_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_rixvox_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.6 GB| + +## References + +https://huggingface.co/KBLab/whisper-small-rixvox + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_swedish_rscolati_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_swedish_rscolati_pipeline_en.md new file mode 100644 index 00000000000000..e1e36433d208d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_swedish_rscolati_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_swedish_rscolati_pipeline pipeline WhisperForCTC from rscolati +author: John Snow Labs +name: whisper_small_swedish_rscolati_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_swedish_rscolati_pipeline` is a English model originally trained by rscolati. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_swedish_rscolati_pipeline_en_5.5.0_3.0_1726893273793.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_swedish_rscolati_pipeline_en_5.5.0_3.0_1726893273793.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_swedish_rscolati_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_swedish_rscolati_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_swedish_rscolati_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/rscolati/whisper-small-sv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_telugu_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_telugu_pipeline_hi.md new file mode 100644 index 00000000000000..2c07de19d1f2eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_telugu_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_small_telugu_pipeline pipeline WhisperForCTC from Mukund017 +author: John Snow Labs +name: whisper_small_telugu_pipeline +date: 2024-09-21 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_telugu_pipeline` is a Hindi model originally trained by Mukund017. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_telugu_pipeline_hi_5.5.0_3.0_1726949049377.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_telugu_pipeline_hi_5.5.0_3.0_1726949049377.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_telugu_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_telugu_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_telugu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Mukund017/whisper-small-telugu + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_english_korif_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_english_korif_pipeline_en.md new file mode 100644 index 00000000000000..937a27eb5a8970 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_english_korif_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_english_korif_pipeline pipeline WhisperForCTC from KoRiF +author: John Snow Labs +name: whisper_tiny_english_korif_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_english_korif_pipeline` is a English model originally trained by KoRiF. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_korif_pipeline_en_5.5.0_3.0_1726962056114.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_english_korif_pipeline_en_5.5.0_3.0_1726962056114.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_english_korif_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_english_korif_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_english_korif_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/KoRiF/whisper-tiny-en + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_frenchmed_v1_pipeline_fr.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_frenchmed_v1_pipeline_fr.md new file mode 100644 index 00000000000000..b499b210d38a5a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_frenchmed_v1_pipeline_fr.md @@ -0,0 +1,69 @@ +--- +layout: model +title: French whisper_tiny_frenchmed_v1_pipeline pipeline WhisperForCTC from Hanhpt23 +author: John Snow Labs +name: whisper_tiny_frenchmed_v1_pipeline +date: 2024-09-21 +tags: [fr, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_frenchmed_v1_pipeline` is a French model originally trained by Hanhpt23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_frenchmed_v1_pipeline_fr_5.5.0_3.0_1726939396382.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_frenchmed_v1_pipeline_fr_5.5.0_3.0_1726939396382.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_frenchmed_v1_pipeline", lang = "fr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_frenchmed_v1_pipeline", lang = "fr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_frenchmed_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fr| +|Size:|379.2 MB| + +## References + +https://huggingface.co/Hanhpt23/whisper-tiny-frenchmed-v1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_hindi_common_voice_16_1_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_hindi_common_voice_16_1_en.md new file mode 100644 index 00000000000000..74ba20c8442f80 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_hindi_common_voice_16_1_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_hindi_common_voice_16_1 WhisperForCTC from archit342000 +author: John Snow Labs +name: whisper_tiny_hindi_common_voice_16_1 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_hindi_common_voice_16_1` is a English model originally trained by archit342000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_hindi_common_voice_16_1_en_5.5.0_3.0_1726961919587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_hindi_common_voice_16_1_en_5.5.0_3.0_1726961919587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_hindi_common_voice_16_1","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_hindi_common_voice_16_1", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_hindi_common_voice_16_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.7 MB| + +## References + +https://huggingface.co/archit342000/whisper_tiny_hi_common_voice_16_1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_minds14_iammartian0_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_minds14_iammartian0_en.md new file mode 100644 index 00000000000000..d69f8dd26b300e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_minds14_iammartian0_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_minds14_iammartian0 WhisperForCTC from iammartian0 +author: John Snow Labs +name: whisper_tiny_minds14_iammartian0 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_iammartian0` is a English model originally trained by iammartian0. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_iammartian0_en_5.5.0_3.0_1726950885193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_iammartian0_en_5.5.0_3.0_1726950885193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_iammartian0","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_iammartian0", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_iammartian0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/iammartian0/whisper-tiny-minds14 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_vietnamese_doof_ferb_pipeline_vi.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_vietnamese_doof_ferb_pipeline_vi.md new file mode 100644 index 00000000000000..8a06344cebf569 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_vietnamese_doof_ferb_pipeline_vi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Vietnamese whisper_tiny_vietnamese_doof_ferb_pipeline pipeline WhisperForCTC from doof-ferb +author: John Snow Labs +name: whisper_tiny_vietnamese_doof_ferb_pipeline +date: 2024-09-21 +tags: [vi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: vi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_vietnamese_doof_ferb_pipeline` is a Vietnamese model originally trained by doof-ferb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_vietnamese_doof_ferb_pipeline_vi_5.5.0_3.0_1726938005333.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_vietnamese_doof_ferb_pipeline_vi_5.5.0_3.0_1726938005333.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_vietnamese_doof_ferb_pipeline", lang = "vi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_vietnamese_doof_ferb_pipeline", lang = "vi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_vietnamese_doof_ferb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|vi| +|Size:|390.0 MB| + +## References + +https://huggingface.co/doof-ferb/whisper-tiny-vi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_vietnamese_doof_ferb_vi.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_vietnamese_doof_ferb_vi.md new file mode 100644 index 00000000000000..468f84906c02d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_vietnamese_doof_ferb_vi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Vietnamese whisper_tiny_vietnamese_doof_ferb WhisperForCTC from doof-ferb +author: John Snow Labs +name: whisper_tiny_vietnamese_doof_ferb +date: 2024-09-21 +tags: [vi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: vi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_vietnamese_doof_ferb` is a Vietnamese model originally trained by doof-ferb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_vietnamese_doof_ferb_vi_5.5.0_3.0_1726937985505.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_vietnamese_doof_ferb_vi_5.5.0_3.0_1726937985505.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_vietnamese_doof_ferb","vi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_vietnamese_doof_ferb", "vi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_vietnamese_doof_ferb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|vi| +|Size:|390.0 MB| + +## References + +https://huggingface.co/doof-ferb/whisper-tiny-vi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_yosthingalindo_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_yosthingalindo_en.md new file mode 100644 index 00000000000000..f75b07c0ae2b74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_yosthingalindo_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_yosthingalindo WhisperForCTC from yosthin06 +author: John Snow Labs +name: whisper_tiny_yosthingalindo +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_yosthingalindo` is a English model originally trained by yosthin06. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_yosthingalindo_en_5.5.0_3.0_1726949599493.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_yosthingalindo_en_5.5.0_3.0_1726949599493.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_yosthingalindo","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_yosthingalindo", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_yosthingalindo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/yosthin06/whisper-tiny_yosthingalindo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_yosthingalindo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_yosthingalindo_pipeline_en.md new file mode 100644 index 00000000000000..62e623ae2aa3f7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_yosthingalindo_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_yosthingalindo_pipeline pipeline WhisperForCTC from yosthin06 +author: John Snow Labs +name: whisper_tiny_yosthingalindo_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_yosthingalindo_pipeline` is a English model originally trained by yosthin06. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_yosthingalindo_pipeline_en_5.5.0_3.0_1726949618522.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_yosthingalindo_pipeline_en_5.5.0_3.0_1726949618522.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_yosthingalindo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_yosthingalindo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_yosthingalindo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/yosthin06/whisper-tiny_yosthingalindo + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-withinapps_ndd_addressbook_test_tags_cwadj_en.md b/docs/_posts/ahmedlone127/2024-09-21-withinapps_ndd_addressbook_test_tags_cwadj_en.md new file mode 100644 index 00000000000000..2a3665e5b13b3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-withinapps_ndd_addressbook_test_tags_cwadj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English withinapps_ndd_addressbook_test_tags_cwadj DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_addressbook_test_tags_cwadj +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_addressbook_test_tags_cwadj` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_addressbook_test_tags_cwadj_en_5.5.0_3.0_1726953244842.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_addressbook_test_tags_cwadj_en_5.5.0_3.0_1726953244842.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_addressbook_test_tags_cwadj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_addressbook_test_tags_cwadj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_addressbook_test_tags_cwadj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-addressbook_test-tags-CWAdj \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_balance_mixed_aug_insert_w2v_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_balance_mixed_aug_insert_w2v_pipeline_en.md new file mode 100644 index 00000000000000..6a81bdd6151567 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_balance_mixed_aug_insert_w2v_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_balance_mixed_aug_insert_w2v_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_balance_mixed_aug_insert_w2v_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_balance_mixed_aug_insert_w2v_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_mixed_aug_insert_w2v_pipeline_en_5.5.0_3.0_1726932490977.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_mixed_aug_insert_w2v_pipeline_en_5.5.0_3.0_1726932490977.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_balance_mixed_aug_insert_w2v_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_balance_mixed_aug_insert_w2v_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_balance_mixed_aug_insert_w2v_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|795.2 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Balance_Mixed-aug_insert_w2v + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_jhagege_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_jhagege_pipeline_en.md new file mode 100644 index 00000000000000..2128bf66ad6b2d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_french_jhagege_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_jhagege_pipeline pipeline XlmRoBertaForTokenClassification from jhagege +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_jhagege_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_jhagege_pipeline` is a English model originally trained by jhagege. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jhagege_pipeline_en_5.5.0_3.0_1726884261312.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jhagege_pipeline_en_5.5.0_3.0_1726884261312.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_jhagege_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_jhagege_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_jhagege_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/jhagege/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_marko_vasic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_marko_vasic_pipeline_en.md new file mode 100644 index 00000000000000..2a7ded5ae2882a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_marko_vasic_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_marko_vasic_pipeline pipeline XlmRoBertaForTokenClassification from marko-vasic +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_marko_vasic_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_marko_vasic_pipeline` is a English model originally trained by marko-vasic. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_marko_vasic_pipeline_en_5.5.0_3.0_1726883977260.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_marko_vasic_pipeline_en_5.5.0_3.0_1726883977260.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_marko_vasic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_marko_vasic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_marko_vasic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/marko-vasic/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_vietnguyen1989_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_vietnguyen1989_en.md new file mode 100644 index 00000000000000..ec2a6c3108404f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_german_vietnguyen1989_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_vietnguyen1989 XlmRoBertaForTokenClassification from vietnguyen1989 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_vietnguyen1989 +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_vietnguyen1989` is a English model originally trained by vietnguyen1989. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_vietnguyen1989_en_5.5.0_3.0_1726897085747.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_vietnguyen1989_en_5.5.0_3.0_1726897085747.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_vietnguyen1989","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_vietnguyen1989", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_vietnguyen1989| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/vietnguyen1989/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_sst2_100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_sst2_100_pipeline_en.md new file mode 100644 index 00000000000000..17e993b8d72560 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_sst2_100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_sst2_100_pipeline pipeline XlmRoBertaForSequenceClassification from tmnam20 +author: John Snow Labs +name: xlm_roberta_base_sst2_100_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_sst2_100_pipeline` is a English model originally trained by tmnam20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_sst2_100_pipeline_en_5.5.0_3.0_1726933512849.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_sst2_100_pipeline_en_5.5.0_3.0_1726933512849.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_sst2_100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_sst2_100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_sst2_100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|779.4 MB| + +## References + +https://huggingface.co/tmnam20/xlm-roberta-base-sst2-100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-0_000003_0_9_en.md b/docs/_posts/ahmedlone127/2024-09-22-0_000003_0_9_en.md new file mode 100644 index 00000000000000..e9fedf50dac437 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-0_000003_0_9_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 0_000003_0_9 RoBertaForSequenceClassification from rose-e-wang +author: John Snow Labs +name: 0_000003_0_9 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`0_000003_0_9` is a English model originally trained by rose-e-wang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/0_000003_0_9_en_5.5.0_3.0_1727016737959.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/0_000003_0_9_en_5.5.0_3.0_1727016737959.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("0_000003_0_9","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("0_000003_0_9", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|0_000003_0_9| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/rose-e-wang/0.000003_0.9 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-1121223_hw01_en.md b/docs/_posts/ahmedlone127/2024-09-22-1121223_hw01_en.md new file mode 100644 index 00000000000000..f844c8b9a73155 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-1121223_hw01_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 1121223_hw01 DistilBertForSequenceClassification from hcyang0401 +author: John Snow Labs +name: 1121223_hw01 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`1121223_hw01` is a English model originally trained by hcyang0401. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/1121223_hw01_en_5.5.0_3.0_1726980646376.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/1121223_hw01_en_5.5.0_3.0_1726980646376.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("1121223_hw01","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("1121223_hw01", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|1121223_hw01| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hcyang0401/1121223_HW01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-1121223_hw01_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-1121223_hw01_pipeline_en.md new file mode 100644 index 00000000000000..93fb2dc012900b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-1121223_hw01_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 1121223_hw01_pipeline pipeline DistilBertForSequenceClassification from hcyang0401 +author: John Snow Labs +name: 1121223_hw01_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`1121223_hw01_pipeline` is a English model originally trained by hcyang0401. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/1121223_hw01_pipeline_en_5.5.0_3.0_1726980658613.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/1121223_hw01_pipeline_en_5.5.0_3.0_1726980658613.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("1121223_hw01_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("1121223_hw01_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|1121223_hw01_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hcyang0401/1121223_HW01 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-4412_model_final_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-4412_model_final_pipeline_en.md new file mode 100644 index 00000000000000..20fe0d45aff86b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-4412_model_final_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 4412_model_final_pipeline pipeline DistilBertForSequenceClassification from nadim365 +author: John Snow Labs +name: 4412_model_final_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`4412_model_final_pipeline` is a English model originally trained by nadim365. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/4412_model_final_pipeline_en_5.5.0_3.0_1727033400460.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/4412_model_final_pipeline_en_5.5.0_3.0_1727033400460.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("4412_model_final_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("4412_model_final_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|4412_model_final_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/nadim365/4412-model-final + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ai_text_model_en.md b/docs/_posts/ahmedlone127/2024-09-22-ai_text_model_en.md new file mode 100644 index 00000000000000..1b59c8f18c58d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ai_text_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ai_text_model BertForSequenceClassification from KaranNag +author: John Snow Labs +name: ai_text_model +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ai_text_model` is a English model originally trained by KaranNag. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ai_text_model_en_5.5.0_3.0_1727032334648.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ai_text_model_en_5.5.0_3.0_1727032334648.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("ai_text_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("ai_text_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ai_text_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/KaranNag/Ai_text_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-aigc_detector_zhv2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-aigc_detector_zhv2_pipeline_en.md new file mode 100644 index 00000000000000..24c4ebe612a715 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-aigc_detector_zhv2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English aigc_detector_zhv2_pipeline pipeline BertForSequenceClassification from yuchuantian +author: John Snow Labs +name: aigc_detector_zhv2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`aigc_detector_zhv2_pipeline` is a English model originally trained by yuchuantian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/aigc_detector_zhv2_pipeline_en_5.5.0_3.0_1727034670519.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/aigc_detector_zhv2_pipeline_en_5.5.0_3.0_1727034670519.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("aigc_detector_zhv2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("aigc_detector_zhv2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|aigc_detector_zhv2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|383.2 MB| + +## References + +https://huggingface.co/yuchuantian/AIGC_detector_zhv2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-akai_flow_classifier_kmai_dev_test_bot_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-akai_flow_classifier_kmai_dev_test_bot_pipeline_en.md new file mode 100644 index 00000000000000..859ac0793d2495 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-akai_flow_classifier_kmai_dev_test_bot_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English akai_flow_classifier_kmai_dev_test_bot_pipeline pipeline BertForSequenceClassification from GautamR +author: John Snow Labs +name: akai_flow_classifier_kmai_dev_test_bot_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`akai_flow_classifier_kmai_dev_test_bot_pipeline` is a English model originally trained by GautamR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/akai_flow_classifier_kmai_dev_test_bot_pipeline_en_5.5.0_3.0_1727007428815.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/akai_flow_classifier_kmai_dev_test_bot_pipeline_en_5.5.0_3.0_1727007428815.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("akai_flow_classifier_kmai_dev_test_bot_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("akai_flow_classifier_kmai_dev_test_bot_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|akai_flow_classifier_kmai_dev_test_bot_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/GautamR/akai_flow_classifier_kmai_dev_test_bot + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-all_roberta_large_v1_home_1000_16_5_oos_en.md b/docs/_posts/ahmedlone127/2024-09-22-all_roberta_large_v1_home_1000_16_5_oos_en.md new file mode 100644 index 00000000000000..7cec4993506a68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-all_roberta_large_v1_home_1000_16_5_oos_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_home_1000_16_5_oos RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_home_1000_16_5_oos +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_home_1000_16_5_oos` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_home_1000_16_5_oos_en_5.5.0_3.0_1727036964269.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_home_1000_16_5_oos_en_5.5.0_3.0_1727036964269.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_home_1000_16_5_oos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_home_1000_16_5_oos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_home_1000_16_5_oos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-home-1000-16-5-oos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-all_roberta_large_v1_home_1_16_5_en.md b/docs/_posts/ahmedlone127/2024-09-22-all_roberta_large_v1_home_1_16_5_en.md new file mode 100644 index 00000000000000..26f017bc705c3e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-all_roberta_large_v1_home_1_16_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_home_1_16_5 RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_home_1_16_5 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_home_1_16_5` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_home_1_16_5_en_5.5.0_3.0_1727027488070.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_home_1_16_5_en_5.5.0_3.0_1727027488070.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_home_1_16_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_home_1_16_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_home_1_16_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-home-1-16-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-all_roberta_large_v1_home_1_16_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-all_roberta_large_v1_home_1_16_5_pipeline_en.md new file mode 100644 index 00000000000000..7bcb547c135ce5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-all_roberta_large_v1_home_1_16_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English all_roberta_large_v1_home_1_16_5_pipeline pipeline RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_home_1_16_5_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_home_1_16_5_pipeline` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_home_1_16_5_pipeline_en_5.5.0_3.0_1727027563561.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_home_1_16_5_pipeline_en_5.5.0_3.0_1727027563561.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_roberta_large_v1_home_1_16_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_roberta_large_v1_home_1_16_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_home_1_16_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-home-1-16-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-araroberta_jo_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-22-araroberta_jo_pipeline_ar.md new file mode 100644 index 00000000000000..3ee221e742c253 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-araroberta_jo_pipeline_ar.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Arabic araroberta_jo_pipeline pipeline RoBertaEmbeddings from reemalyami +author: John Snow Labs +name: araroberta_jo_pipeline +date: 2024-09-22 +tags: [ar, open_source, pipeline, onnx] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`araroberta_jo_pipeline` is a Arabic model originally trained by reemalyami. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/araroberta_jo_pipeline_ar_5.5.0_3.0_1726999534008.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/araroberta_jo_pipeline_ar_5.5.0_3.0_1726999534008.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("araroberta_jo_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("araroberta_jo_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|araroberta_jo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|470.7 MB| + +## References + +https://huggingface.co/reemalyami/AraRoBERTa-JO + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-augmented_model_fast_2b_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-augmented_model_fast_2b_pipeline_en.md new file mode 100644 index 00000000000000..21b65570a446c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-augmented_model_fast_2b_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English augmented_model_fast_2b_pipeline pipeline DistilBertForSequenceClassification from LeonardoFettucciari +author: John Snow Labs +name: augmented_model_fast_2b_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`augmented_model_fast_2b_pipeline` is a English model originally trained by LeonardoFettucciari. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/augmented_model_fast_2b_pipeline_en_5.5.0_3.0_1727033153203.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/augmented_model_fast_2b_pipeline_en_5.5.0_3.0_1727033153203.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("augmented_model_fast_2b_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("augmented_model_fast_2b_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|augmented_model_fast_2b_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LeonardoFettucciari/augmented_model_fast_2b + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-auro_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-auro_1_pipeline_en.md new file mode 100644 index 00000000000000..9da52e866f0c9c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-auro_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English auro_1_pipeline pipeline RoBertaForSequenceClassification from BaronSch +author: John Snow Labs +name: auro_1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`auro_1_pipeline` is a English model originally trained by BaronSch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/auro_1_pipeline_en_5.5.0_3.0_1726972032341.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/auro_1_pipeline_en_5.5.0_3.0_1726972032341.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("auro_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("auro_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|auro_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/BaronSch/AURO_1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-autonlp_bp_29016523_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-autonlp_bp_29016523_pipeline_en.md new file mode 100644 index 00000000000000..4cee505e1ceba4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-autonlp_bp_29016523_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autonlp_bp_29016523_pipeline pipeline BertForSequenceClassification from JushBJJ +author: John Snow Labs +name: autonlp_bp_29016523_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autonlp_bp_29016523_pipeline` is a English model originally trained by JushBJJ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autonlp_bp_29016523_pipeline_en_5.5.0_3.0_1727007984994.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autonlp_bp_29016523_pipeline_en_5.5.0_3.0_1727007984994.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autonlp_bp_29016523_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autonlp_bp_29016523_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autonlp_bp_29016523_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/JushBJJ/autonlp-bp-29016523 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-autotrain_2_roberta_r_53890126861_en.md b/docs/_posts/ahmedlone127/2024-09-22-autotrain_2_roberta_r_53890126861_en.md new file mode 100644 index 00000000000000..d21bfbe1d01d7c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-autotrain_2_roberta_r_53890126861_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English autotrain_2_roberta_r_53890126861 BertForTokenClassification from tinyYhorm +author: John Snow Labs +name: autotrain_2_roberta_r_53890126861 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_2_roberta_r_53890126861` is a English model originally trained by tinyYhorm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_2_roberta_r_53890126861_en_5.5.0_3.0_1726977296993.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_2_roberta_r_53890126861_en_5.5.0_3.0_1726977296993.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("autotrain_2_roberta_r_53890126861","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("autotrain_2_roberta_r_53890126861", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_2_roberta_r_53890126861| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|381.1 MB| + +## References + +https://huggingface.co/tinyYhorm/autotrain-2-roberta-r-53890126861 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-autotrain_2_roberta_r_53890126861_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-autotrain_2_roberta_r_53890126861_pipeline_en.md new file mode 100644 index 00000000000000..a7cc00696f6cb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-autotrain_2_roberta_r_53890126861_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autotrain_2_roberta_r_53890126861_pipeline pipeline BertForTokenClassification from tinyYhorm +author: John Snow Labs +name: autotrain_2_roberta_r_53890126861_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_2_roberta_r_53890126861_pipeline` is a English model originally trained by tinyYhorm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_2_roberta_r_53890126861_pipeline_en_5.5.0_3.0_1726977313813.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_2_roberta_r_53890126861_pipeline_en_5.5.0_3.0_1726977313813.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_2_roberta_r_53890126861_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_2_roberta_r_53890126861_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_2_roberta_r_53890126861_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|381.1 MB| + +## References + +https://huggingface.co/tinyYhorm/autotrain-2-roberta-r-53890126861 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_en.md b/docs/_posts/ahmedlone127/2024-09-22-balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_en.md new file mode 100644 index 00000000000000..f70ebb245cf551 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2 BertForTokenClassification from Jsevisal +author: John Snow Labs +name: balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2` is a English model originally trained by Jsevisal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_en_5.5.0_3.0_1727041010402.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_en_5.5.0_3.0_1727041010402.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Jsevisal/balanced-augmented-bert-large-gest-pred-seqeval-partialmatch-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_pipeline_en.md new file mode 100644 index 00000000000000..5ae59ff6ca7c36 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_pipeline pipeline BertForTokenClassification from Jsevisal +author: John Snow Labs +name: balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_pipeline` is a English model originally trained by Jsevisal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_pipeline_en_5.5.0_3.0_1727041070860.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_pipeline_en_5.5.0_3.0_1727041070860.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|balanced_augmented_bert_large_gest_pred_seqeval_partialmatch_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Jsevisal/balanced-augmented-bert-large-gest-pred-seqeval-partialmatch-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_squad_bosnian_16_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_squad_bosnian_16_en.md new file mode 100644 index 00000000000000..6a413e512c597f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_squad_bosnian_16_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_cased_finetuned_squad_bosnian_16 BertForQuestionAnswering from lauraparra28 +author: John Snow Labs +name: bert_base_cased_finetuned_squad_bosnian_16 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_squad_bosnian_16` is a English model originally trained by lauraparra28. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_squad_bosnian_16_en_5.5.0_3.0_1727042658834.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_squad_bosnian_16_en_5.5.0_3.0_1727042658834.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_finetuned_squad_bosnian_16","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_finetuned_squad_bosnian_16", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_squad_bosnian_16| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/lauraparra28/bert-base-cased-finetuned-squad-bs_16 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_squad_bosnian_16_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_squad_bosnian_16_pipeline_en.md new file mode 100644 index 00000000000000..40b9c30c23c7de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_squad_bosnian_16_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_cased_finetuned_squad_bosnian_16_pipeline pipeline BertForQuestionAnswering from lauraparra28 +author: John Snow Labs +name: bert_base_cased_finetuned_squad_bosnian_16_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_squad_bosnian_16_pipeline` is a English model originally trained by lauraparra28. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_squad_bosnian_16_pipeline_en_5.5.0_3.0_1727042679168.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_squad_bosnian_16_pipeline_en_5.5.0_3.0_1727042679168.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_finetuned_squad_bosnian_16_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_finetuned_squad_bosnian_16_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_squad_bosnian_16_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/lauraparra28/bert-base-cased-finetuned-squad-bs_16 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_chinese_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_chinese_finetuned_ner_en.md new file mode 100644 index 00000000000000..3f1aaf1147fc07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_chinese_finetuned_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_chinese_finetuned_ner BertForTokenClassification from zhiguoxu +author: John Snow Labs +name: bert_base_chinese_finetuned_ner +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_finetuned_ner` is a English model originally trained by zhiguoxu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_ner_en_5.5.0_3.0_1727015818418.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_ner_en_5.5.0_3.0_1727015818418.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_chinese_finetuned_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_chinese_finetuned_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_chinese_finetuned_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|381.1 MB| + +## References + +https://huggingface.co/zhiguoxu/bert-base-chinese-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_chinese_finetuned_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_chinese_finetuned_ner_pipeline_en.md new file mode 100644 index 00000000000000..e4bb21966f99b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_chinese_finetuned_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_chinese_finetuned_ner_pipeline pipeline BertForTokenClassification from zhiguoxu +author: John Snow Labs +name: bert_base_chinese_finetuned_ner_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_finetuned_ner_pipeline` is a English model originally trained by zhiguoxu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_ner_pipeline_en_5.5.0_3.0_1727015834998.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_ner_pipeline_en_5.5.0_3.0_1727015834998.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_chinese_finetuned_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_chinese_finetuned_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_chinese_finetuned_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|381.2 MB| + +## References + +https://huggingface.co/zhiguoxu/bert-base-chinese-finetuned-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_chinese_finetuned_qa_b32_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_chinese_finetuned_qa_b32_en.md new file mode 100644 index 00000000000000..9258c2f608fcb6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_chinese_finetuned_qa_b32_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_chinese_finetuned_qa_b32 BertForQuestionAnswering from sharkMeow +author: John Snow Labs +name: bert_base_chinese_finetuned_qa_b32 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_finetuned_qa_b32` is a English model originally trained by sharkMeow. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_qa_b32_en_5.5.0_3.0_1727049451096.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_qa_b32_en_5.5.0_3.0_1727049451096.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_chinese_finetuned_qa_b32","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_chinese_finetuned_qa_b32", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_chinese_finetuned_qa_b32| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|381.0 MB| + +## References + +https://huggingface.co/sharkMeow/bert-base-chinese-finetuned-QA-b32 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_chinese_finetuned_qa_b32_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_chinese_finetuned_qa_b32_pipeline_en.md new file mode 100644 index 00000000000000..2267f1edf952cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_chinese_finetuned_qa_b32_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_chinese_finetuned_qa_b32_pipeline pipeline BertForQuestionAnswering from sharkMeow +author: John Snow Labs +name: bert_base_chinese_finetuned_qa_b32_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_finetuned_qa_b32_pipeline` is a English model originally trained by sharkMeow. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_qa_b32_pipeline_en_5.5.0_3.0_1727049469914.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_qa_b32_pipeline_en_5.5.0_3.0_1727049469914.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_chinese_finetuned_qa_b32_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_chinese_finetuned_qa_b32_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_chinese_finetuned_qa_b32_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|381.0 MB| + +## References + +https://huggingface.co/sharkMeow/bert-base-chinese-finetuned-QA-b32 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_embedding_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_embedding_en.md new file mode 100644 index 00000000000000..860afdf2fdb082 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_embedding_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_embedding BertEmbeddings from CH3COOK +author: John Snow Labs +name: bert_base_embedding +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_embedding` is a English model originally trained by CH3COOK. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_embedding_en_5.5.0_3.0_1727002932073.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_embedding_en_5.5.0_3.0_1727002932073.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_embedding","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_embedding","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_embedding| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.9 MB| + +## References + +https://huggingface.co/CH3COOK/bert-base-embedding \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_embedding_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_embedding_pipeline_en.md new file mode 100644 index 00000000000000..033d229d7d00ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_embedding_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_embedding_pipeline pipeline BertEmbeddings from CH3COOK +author: John Snow Labs +name: bert_base_embedding_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_embedding_pipeline` is a English model originally trained by CH3COOK. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_embedding_pipeline_en_5.5.0_3.0_1727002949825.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_embedding_pipeline_en_5.5.0_3.0_1727002949825.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_embedding_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_embedding_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_embedding_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.0 MB| + +## References + +https://huggingface.co/CH3COOK/bert-base-embedding + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_historic_multilingual_64k_td_cased_squad_french_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_historic_multilingual_64k_td_cased_squad_french_pipeline_xx.md new file mode 100644 index 00000000000000..6e5bf8c198d480 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_historic_multilingual_64k_td_cased_squad_french_pipeline_xx.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Multilingual bert_base_historic_multilingual_64k_td_cased_squad_french_pipeline pipeline BertForQuestionAnswering from Nadav +author: John Snow Labs +name: bert_base_historic_multilingual_64k_td_cased_squad_french_pipeline +date: 2024-09-22 +tags: [xx, open_source, pipeline, onnx] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_historic_multilingual_64k_td_cased_squad_french_pipeline` is a Multilingual model originally trained by Nadav. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_historic_multilingual_64k_td_cased_squad_french_pipeline_xx_5.5.0_3.0_1727049409312.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_historic_multilingual_64k_td_cased_squad_french_pipeline_xx_5.5.0_3.0_1727049409312.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_historic_multilingual_64k_td_cased_squad_french_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_historic_multilingual_64k_td_cased_squad_french_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_historic_multilingual_64k_td_cased_squad_french_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|504.6 MB| + +## References + +https://huggingface.co/Nadav/bert-base-historic-multilingual-64k-td-cased-squad-fr + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_italian_xxl_uncased_italian_finetuned_emotions_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_italian_xxl_uncased_italian_finetuned_emotions_en.md new file mode 100644 index 00000000000000..fe70b4a7d8f027 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_italian_xxl_uncased_italian_finetuned_emotions_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_italian_xxl_uncased_italian_finetuned_emotions BertForSequenceClassification from MelmaGrigia +author: John Snow Labs +name: bert_base_italian_xxl_uncased_italian_finetuned_emotions +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_italian_xxl_uncased_italian_finetuned_emotions` is a English model originally trained by MelmaGrigia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_italian_xxl_uncased_italian_finetuned_emotions_en_5.5.0_3.0_1727030079822.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_italian_xxl_uncased_italian_finetuned_emotions_en_5.5.0_3.0_1727030079822.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_italian_xxl_uncased_italian_finetuned_emotions","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_italian_xxl_uncased_italian_finetuned_emotions", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_italian_xxl_uncased_italian_finetuned_emotions| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|414.8 MB| + +## References + +https://huggingface.co/MelmaGrigia/bert-base-italian-xxl-uncased-italian-finetuned-emotions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_italian_xxl_uncased_italian_finetuned_emotions_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_italian_xxl_uncased_italian_finetuned_emotions_pipeline_en.md new file mode 100644 index 00000000000000..497ba497ee3ba4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_italian_xxl_uncased_italian_finetuned_emotions_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_italian_xxl_uncased_italian_finetuned_emotions_pipeline pipeline BertForSequenceClassification from MelmaGrigia +author: John Snow Labs +name: bert_base_italian_xxl_uncased_italian_finetuned_emotions_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_italian_xxl_uncased_italian_finetuned_emotions_pipeline` is a English model originally trained by MelmaGrigia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_italian_xxl_uncased_italian_finetuned_emotions_pipeline_en_5.5.0_3.0_1727030100823.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_italian_xxl_uncased_italian_finetuned_emotions_pipeline_en_5.5.0_3.0_1727030100823.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_italian_xxl_uncased_italian_finetuned_emotions_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_italian_xxl_uncased_italian_finetuned_emotions_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_italian_xxl_uncased_italian_finetuned_emotions_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.8 MB| + +## References + +https://huggingface.co/MelmaGrigia/bert-base-italian-xxl-uncased-italian-finetuned-emotions + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_cased_squad_ani2857_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_cased_squad_ani2857_pipeline_xx.md new file mode 100644 index 00000000000000..da4c34d00fb427 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_cased_squad_ani2857_pipeline_xx.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_squad_ani2857_pipeline pipeline BertForQuestionAnswering from ani2857 +author: John Snow Labs +name: bert_base_multilingual_cased_squad_ani2857_pipeline +date: 2024-09-22 +tags: [xx, open_source, pipeline, onnx] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_squad_ani2857_pipeline` is a Multilingual model originally trained by ani2857. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_squad_ani2857_pipeline_xx_5.5.0_3.0_1727049523710.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_squad_ani2857_pipeline_xx_5.5.0_3.0_1727049523710.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_cased_squad_ani2857_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_cased_squad_ani2857_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_squad_ani2857_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/ani2857/bert-base-multilingual-cased-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_cased_squad_ani2857_xx.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_cased_squad_ani2857_xx.md new file mode 100644 index 00000000000000..69f8b8eed6743b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_cased_squad_ani2857_xx.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_squad_ani2857 BertForQuestionAnswering from ani2857 +author: John Snow Labs +name: bert_base_multilingual_cased_squad_ani2857 +date: 2024-09-22 +tags: [xx, open_source, onnx, question_answering, bert] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_squad_ani2857` is a Multilingual model originally trained by ani2857. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_squad_ani2857_xx_5.5.0_3.0_1727049492036.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_squad_ani2857_xx_5.5.0_3.0_1727049492036.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_multilingual_cased_squad_ani2857","xx") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_multilingual_cased_squad_ani2857", "xx") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_squad_ani2857| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|xx| +|Size:|665.1 MB| + +## References + +https://huggingface.co/ani2857/bert-base-multilingual-cased-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_uncased_sentiment_navenprasad_xx.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_uncased_sentiment_navenprasad_xx.md new file mode 100644 index 00000000000000..adbdef9e2d0505 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_uncased_sentiment_navenprasad_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_sentiment_navenprasad BertForSequenceClassification from navenprasad +author: John Snow Labs +name: bert_base_multilingual_uncased_sentiment_navenprasad +date: 2024-09-22 +tags: [xx, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_sentiment_navenprasad` is a Multilingual model originally trained by navenprasad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_navenprasad_xx_5.5.0_3.0_1727034454039.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_navenprasad_xx_5.5.0_3.0_1727034454039.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_uncased_sentiment_navenprasad","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_uncased_sentiment_navenprasad", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_sentiment_navenprasad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|627.7 MB| + +## References + +https://huggingface.co/navenprasad/bert-base-multilingual-uncased-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_en.md new file mode 100644 index 00000000000000..f7e83f487c0373 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_en_5.5.0_3.0_1727042449961.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_en_5.5.0_3.0_1727042449961.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915003326 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_pipeline_en.md new file mode 100644 index 00000000000000..10533e9e440592 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_pipeline_en_5.5.0_3.0_1727042470722.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_pipeline_en_5.5.0_3.0_1727042470722.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915003326_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915003326 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_pipeline_en.md new file mode 100644 index 00000000000000..54218906163d89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_pipeline_en_5.5.0_3.0_1727039402106.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_pipeline_en_5.5.0_3.0_1727039402106.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915121227_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915121227 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_en.md new file mode 100644 index 00000000000000..e0b7075885580a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_en_5.5.0_3.0_1727039525805.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_en_5.5.0_3.0_1727039525805.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915122349 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_en.md new file mode 100644 index 00000000000000..47f2a5fcdf053b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_en_5.5.0_3.0_1727039808767.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_en_5.5.0_3.0_1727039808767.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915123325 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_pipeline_en.md new file mode 100644 index 00000000000000..f7353c2c15facc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_pipeline_en_5.5.0_3.0_1727039829562.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_pipeline_en_5.5.0_3.0_1727039829562.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915123325_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915123325 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_coqa_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_coqa_en.md new file mode 100644 index 00000000000000..0e43327853c8d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_coqa_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_coqa BertForQuestionAnswering from rooftopcoder +author: John Snow Labs +name: bert_base_uncased_coqa +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_coqa` is a English model originally trained by rooftopcoder. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_coqa_en_5.5.0_3.0_1726991819132.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_coqa_en_5.5.0_3.0_1726991819132.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_coqa","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_coqa", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_coqa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/rooftopcoder/bert-base-uncased-coqa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_english_sentweet_profane_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_english_sentweet_profane_pipeline_en.md new file mode 100644 index 00000000000000..35285e1a9cd778 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_english_sentweet_profane_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_english_sentweet_profane_pipeline pipeline BertForSequenceClassification from jayanta +author: John Snow Labs +name: bert_base_uncased_english_sentweet_profane_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_english_sentweet_profane_pipeline` is a English model originally trained by jayanta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_english_sentweet_profane_pipeline_en_5.5.0_3.0_1727030027707.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_english_sentweet_profane_pipeline_en_5.5.0_3.0_1727030027707.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_english_sentweet_profane_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_english_sentweet_profane_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_english_sentweet_profane_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/jayanta/bert-base-uncased-english-sentweet-profane + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md new file mode 100644 index 00000000000000..0846ee1818c970 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727042598681.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727042598681.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-10.0-b-32-lr-4e-07-dp-0.5-ss-0-st-True-fh-False-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..5a95914e34bc7c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1727042622649.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1727042622649.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_10_0_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-10.0-b-32-lr-4e-07-dp-0.5-ss-0-st-True-fh-False-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_1_29_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_300_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_1_29_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_300_en.md new file mode 100644 index 00000000000000..a43271e4ba1dd2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_1_29_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_300_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_29_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_300 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_29_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_300 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_29_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_300` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_29_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_300_en_5.5.0_3.0_1727042888237.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_29_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_300_en_5.5.0_3.0_1727042888237.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_1_29_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_300","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_1_29_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_300", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_29_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_300| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.29-b-32-lr-8e-07-dp-0.5-ss-0-st-False-fh-False-hs-300 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..57b4b1ba89d9c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1726991729708.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1726991729708.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_56_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.56-b-32-lr-8e-07-dp-0.5-ss-0-st-True-fh-False-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..0508d630ecac0e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1726992034386.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1726992034386.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_2_2_b_32_lr_1_2e_06_dp_0_3_swati_500_southern_sotho_false_fh_true_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-2.2-b-32-lr-1.2e-06-dp-0.3-ss-500-st-False-fh-True-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..109063d72037e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727042414599.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727042414599.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_2_62_b_32_lr_4e_07_dp_1_0_swati_600_southern_sotho_false_fh_true_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-2.62-b-32-lr-4e-07-dp-1.0-ss-600-st-False-fh-True-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..ec9d162e305811 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1727049216838.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1727049216838.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_2_69_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-2.69-b-32-lr-8e-07-dp-0.5-ss-0-st-True-fh-False-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_en.md new file mode 100644 index 00000000000000..0dcfd66f76c494 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727049329348.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727049329348.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-3.11-b-32-lr-4e-06-dp-0.1-ss-700-st-False-fh-True-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..3cc32ba1e4ac6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727049354258.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727049354258.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_3_11_b_32_lr_4e_06_dp_0_1_swati_700_southern_sotho_false_fh_true_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-3.11-b-32-lr-4e-06-dp-0.1-ss-700-st-False-fh-True-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_en.md new file mode 100644 index 00000000000000..47aae29b6b3bc5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727042880900.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727042880900.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-3.44-b-32-lr-1.2e-06-dp-0.3-ss-0-st-True-fh-False-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..0fa3964e77fd11 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1727042901943.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1727042901943.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_3_44_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_true_fh_false_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-3.44-b-32-lr-1.2e-06-dp-0.3-ss-0-st-True-fh-False-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000_en.md new file mode 100644 index 00000000000000..53f19cb1f007ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000_en_5.5.0_3.0_1726992065533.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000_en_5.5.0_3.0_1726992065533.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_0_8_lr_1e_06_wd_0_001_dp_0_99999_swati_20000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-0.8-lr-1e-06-wd-0.001-dp-0.99999-ss-20000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_en.md new file mode 100644 index 00000000000000..49c94a4c5c3d49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_en_5.5.0_3.0_1727042493060.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000_en_5.5.0_3.0_1727042493060.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_0_9_lr_1e_05_wd_0_001_dp_0_99999_swati_70000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-0.9-lr-1e-05-wd-0.001-dp-0.99999-ss-70000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_en.md new file mode 100644 index 00000000000000..5fa86b91ae2145 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_en_5.5.0_3.0_1727042442666.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_en_5.5.0_3.0_1727042442666.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-06-wd-0.001-dp-0.2-ss-500-st-True-fh-True \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_en.md new file mode 100644 index 00000000000000..27f51a4dbe8bf1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_en_5.5.0_3.0_1727049583991.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100_en_5.5.0_3.0_1727049583991.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_4e_07_wd_1e_05_dp_1_0_swati_0_southern_sotho_false_fh_false_hs_100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-4e-07-wd-1e-05-dp-1.0-ss-0-st-False-fh-False-hs-100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0_en.md new file mode 100644 index 00000000000000..0f4558287e2ca0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0_en_5.5.0_3.0_1726992335894.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0_en_5.5.0_3.0_1726992335894.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_9e_07_wd_0_001_dp_0_999_swati_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-9e-07-wd-0.001-dp-0.999-ss-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000_en.md new file mode 100644 index 00000000000000..8e866771059a79 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000_en_5.5.0_3.0_1726991700507.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000_en_5.5.0_3.0_1726991700507.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_1_lr_1e_06_wd_0_001_dp_0_99999_swati_70000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.1-lr-1e-06-wd-0.001-dp-0.99999-ss-70000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_pipeline_en.md new file mode 100644 index 00000000000000..8e6c706bbea257 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_pipeline_en_5.5.0_3.0_1727043024998.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_pipeline_en_5.5.0_3.0_1727043024998.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_3_lr_1e_06_wd_0_001_dp_0_99999_swati_120000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.3-lr-1e-06-wd-0.001-dp-0.99999-ss-120000 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_en.md new file mode 100644 index 00000000000000..68f0a93dd13744 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_en_5.5.0_3.0_1727042575405.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_en_5.5.0_3.0_1727042575405.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-0.0001-wd-0.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_pipeline_en.md new file mode 100644 index 00000000000000..39ef7fad96c2af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_pipeline_en_5.5.0_3.0_1727042596169.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_pipeline_en_5.5.0_3.0_1727042596169.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_0_0001_wd_0_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-0.0001-wd-0.1 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_en.md new file mode 100644 index 00000000000000..ecdc6ffd21280d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_en_5.5.0_3.0_1727042736954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0_en_5.5.0_3.0_1727042736954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_02_swati_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-1e-05-wd-0.001-dp-0.02-ss-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_en.md new file mode 100644 index 00000000000000..4841ad1ffc4052 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_en_5.5.0_3.0_1727042715022.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_en_5.5.0_3.0_1727042715022.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-3.0-lr-1e-05-wd-0.001-dp-0.99999-ss-800 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_pipeline_en.md new file mode 100644 index 00000000000000..8ac7b56cc4f1ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_pipeline_en_5.5.0_3.0_1727042736259.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_pipeline_en_5.5.0_3.0_1727042736259.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_3_0_lr_1e_05_wd_0_001_dp_0_99999_swati_800_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-3.0-lr-1e-05-wd-0.001-dp-0.99999-ss-800 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_en.md new file mode 100644 index 00000000000000..fd6a49b9c7702c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_en_5.5.0_3.0_1727043021678.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_en_5.5.0_3.0_1727043021678.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-4.0-lr-0.0005-wd-0.01-dp-0.41 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_pipeline_en.md new file mode 100644 index 00000000000000..8d809c25218553 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_pipeline_en_5.5.0_3.0_1727043042350.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_pipeline_en_5.5.0_3.0_1727043042350.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_dp_0_41_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-4.0-lr-0.0005-wd-0.01-dp-0.41 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_pipeline_en.md new file mode 100644 index 00000000000000..9551fabbf7ceea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_pipeline_en_5.5.0_3.0_1726991995181.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_pipeline_en_5.5.0_3.0_1726991995181.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_4_0_lr_0_0005_wd_0_01_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-4.0-lr-0.0005-wd-0.01 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_en.md new file mode 100644 index 00000000000000..bc5be16685f9a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetuned BertForQuestionAnswering from PabloGuinea +author: John Snow Labs +name: bert_base_uncased_finetuned +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned` is a English model originally trained by PabloGuinea. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_en_5.5.0_3.0_1727043049082.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_en_5.5.0_3.0_1727043049082.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetuned","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetuned", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/PabloGuinea/bert-base-uncased-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_squad2_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_squad2_en.md new file mode 100644 index 00000000000000..ea8b68831a094f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_squad2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_squad2 BertForQuestionAnswering from thewiz +author: John Snow Labs +name: bert_base_uncased_finetuned_squad2 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_squad2` is a English model originally trained by thewiz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_squad2_en_5.5.0_3.0_1727042301710.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_squad2_en_5.5.0_3.0_1727042301710.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetuned_squad2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetuned_squad2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_squad2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/thewiz/bert-base-uncased-finetuned-squad2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_squad_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_squad_v1_pipeline_en.md new file mode 100644 index 00000000000000..ecd06c06b2577a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_squad_v1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_squad_v1_pipeline pipeline BertForQuestionAnswering from helenai +author: John Snow Labs +name: bert_base_uncased_squad_v1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_squad_v1_pipeline` is a English model originally trained by helenai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_squad_v1_pipeline_en_5.5.0_3.0_1726978447809.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_squad_v1_pipeline_en_5.5.0_3.0_1726978447809.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_squad_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_squad_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_squad_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/helenai/bert-base-uncased-squad-v1 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_body_shaming_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_body_shaming_en.md new file mode 100644 index 00000000000000..ce24624073851c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_body_shaming_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_body_shaming DistilBertForSequenceClassification from Christina0824 +author: John Snow Labs +name: bert_body_shaming +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_body_shaming` is a English model originally trained by Christina0824. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_body_shaming_en_5.5.0_3.0_1727012334717.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_body_shaming_en_5.5.0_3.0_1727012334717.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_body_shaming","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_body_shaming", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_body_shaming| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Christina0824/BERT_body_shaming \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_body_shaming_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_body_shaming_pipeline_en.md new file mode 100644 index 00000000000000..402c7144b9e225 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_body_shaming_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_body_shaming_pipeline pipeline DistilBertForSequenceClassification from Christina0824 +author: John Snow Labs +name: bert_body_shaming_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_body_shaming_pipeline` is a English model originally trained by Christina0824. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_body_shaming_pipeline_en_5.5.0_3.0_1727012346775.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_body_shaming_pipeline_en_5.5.0_3.0_1727012346775.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_body_shaming_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_body_shaming_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_body_shaming_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Christina0824/BERT_body_shaming + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_classifier_sped_transactions_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_classifier_sped_transactions_en.md new file mode 100644 index 00000000000000..e1f86c2954bd68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_classifier_sped_transactions_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_classifier_sped_transactions BertForSequenceClassification from lcaffreymaffei +author: John Snow Labs +name: bert_classifier_sped_transactions +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_classifier_sped_transactions` is a English model originally trained by lcaffreymaffei. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_classifier_sped_transactions_en_5.5.0_3.0_1727034094533.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_classifier_sped_transactions_en_5.5.0_3.0_1727034094533.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_classifier_sped_transactions","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_classifier_sped_transactions", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_classifier_sped_transactions| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/lcaffreymaffei/bert_classifier_sped_transactions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_classifier_sped_transactions_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_classifier_sped_transactions_pipeline_en.md new file mode 100644 index 00000000000000..c5303663b0c9be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_classifier_sped_transactions_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_classifier_sped_transactions_pipeline pipeline BertForSequenceClassification from lcaffreymaffei +author: John Snow Labs +name: bert_classifier_sped_transactions_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_classifier_sped_transactions_pipeline` is a English model originally trained by lcaffreymaffei. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_classifier_sped_transactions_pipeline_en_5.5.0_3.0_1727034115140.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_classifier_sped_transactions_pipeline_en_5.5.0_3.0_1727034115140.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_classifier_sped_transactions_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_classifier_sped_transactions_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_classifier_sped_transactions_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/lcaffreymaffei/bert_classifier_sped_transactions + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_ner_koakande_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_ner_koakande_en.md new file mode 100644 index 00000000000000..14109fad0569ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_ner_koakande_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_ner_koakande BertForTokenClassification from koakande +author: John Snow Labs +name: bert_finetuned_ner_koakande +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_koakande` is a English model originally trained by koakande. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_koakande_en_5.5.0_3.0_1727045435304.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_koakande_en_5.5.0_3.0_1727045435304.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_koakande","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_koakande", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_koakande| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/koakande/bert-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_squad_benroma_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_squad_benroma_pipeline_en.md new file mode 100644 index 00000000000000..a18befb2f3ea58 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_squad_benroma_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_finetuned_squad_benroma_pipeline pipeline BertForQuestionAnswering from benroma +author: John Snow Labs +name: bert_finetuned_squad_benroma_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_squad_benroma_pipeline` is a English model originally trained by benroma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_benroma_pipeline_en_5.5.0_3.0_1727049406644.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_benroma_pipeline_en_5.5.0_3.0_1727049406644.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_squad_benroma_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_squad_benroma_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_squad_benroma_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/benroma/bert-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_squad_robkayinto_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_squad_robkayinto_en.md new file mode 100644 index 00000000000000..554d857f5596f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_squad_robkayinto_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_finetuned_squad_robkayinto BertForQuestionAnswering from robkayinto +author: John Snow Labs +name: bert_finetuned_squad_robkayinto +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_squad_robkayinto` is a English model originally trained by robkayinto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_robkayinto_en_5.5.0_3.0_1727042792146.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_robkayinto_en_5.5.0_3.0_1727042792146.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_squad_robkayinto","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_squad_robkayinto", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_squad_robkayinto| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/robkayinto/bert-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_squad_robkayinto_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_squad_robkayinto_pipeline_en.md new file mode 100644 index 00000000000000..b4a6ae054f3317 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_squad_robkayinto_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_finetuned_squad_robkayinto_pipeline pipeline BertForQuestionAnswering from robkayinto +author: John Snow Labs +name: bert_finetuned_squad_robkayinto_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_squad_robkayinto_pipeline` is a English model originally trained by robkayinto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_robkayinto_pipeline_en_5.5.0_3.0_1727042812823.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_robkayinto_pipeline_en_5.5.0_3.0_1727042812823.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_squad_robkayinto_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_squad_robkayinto_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_squad_robkayinto_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/robkayinto/bert-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_swag_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_swag_pipeline_en.md new file mode 100644 index 00000000000000..3567687efcec6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_swag_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_finetuned_swag_pipeline pipeline BertForQuestionAnswering from ashaduzzaman +author: John Snow Labs +name: bert_finetuned_swag_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_swag_pipeline` is a English model originally trained by ashaduzzaman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_swag_pipeline_en_5.5.0_3.0_1726992369375.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_swag_pipeline_en_5.5.0_3.0_1726992369375.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_swag_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_swag_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_swag_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ashaduzzaman/bert-finetuned-swag + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_human_label_multiperspective_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_human_label_multiperspective_en.md new file mode 100644 index 00000000000000..bd5c02ce3d8a05 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_human_label_multiperspective_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_human_label_multiperspective BertForSequenceClassification from Multiperspective +author: John Snow Labs +name: bert_human_label_multiperspective +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_human_label_multiperspective` is a English model originally trained by Multiperspective. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_human_label_multiperspective_en_5.5.0_3.0_1727032927670.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_human_label_multiperspective_en_5.5.0_3.0_1727032927670.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_human_label_multiperspective","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_human_label_multiperspective", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_human_label_multiperspective| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Multiperspective/bert-human_label \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_imdb_danielcd99_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_imdb_danielcd99_en.md new file mode 100644 index 00000000000000..1d4107eee7bba6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_imdb_danielcd99_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_imdb_danielcd99 BertForSequenceClassification from danielcd99 +author: John Snow Labs +name: bert_imdb_danielcd99 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_imdb_danielcd99` is a English model originally trained by danielcd99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_imdb_danielcd99_en_5.5.0_3.0_1727032019543.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_imdb_danielcd99_en_5.5.0_3.0_1727032019543.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_imdb_danielcd99","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_imdb_danielcd99", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_imdb_danielcd99| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/danielcd99/BERT_imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_imdb_danielcd99_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_imdb_danielcd99_pipeline_en.md new file mode 100644 index 00000000000000..f80d72e6632711 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_imdb_danielcd99_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_imdb_danielcd99_pipeline pipeline BertForSequenceClassification from danielcd99 +author: John Snow Labs +name: bert_imdb_danielcd99_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_imdb_danielcd99_pipeline` is a English model originally trained by danielcd99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_imdb_danielcd99_pipeline_en_5.5.0_3.0_1727032044936.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_imdb_danielcd99_pipeline_en_5.5.0_3.0_1727032044936.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_imdb_danielcd99_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_imdb_danielcd99_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_imdb_danielcd99_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/danielcd99/BERT_imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_large_portuguese_cased_assin2_entailment_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-22-bert_large_portuguese_cased_assin2_entailment_pipeline_pt.md new file mode 100644 index 00000000000000..d4891bf9723144 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_large_portuguese_cased_assin2_entailment_pipeline_pt.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Portuguese bert_large_portuguese_cased_assin2_entailment_pipeline pipeline BertForSequenceClassification from ruanchaves +author: John Snow Labs +name: bert_large_portuguese_cased_assin2_entailment_pipeline +date: 2024-09-22 +tags: [pt, open_source, pipeline, onnx] +task: Text Classification +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_portuguese_cased_assin2_entailment_pipeline` is a Portuguese model originally trained by ruanchaves. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_portuguese_cased_assin2_entailment_pipeline_pt_5.5.0_3.0_1727032551203.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_portuguese_cased_assin2_entailment_pipeline_pt_5.5.0_3.0_1727032551203.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_portuguese_cased_assin2_entailment_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_portuguese_cased_assin2_entailment_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_portuguese_cased_assin2_entailment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ruanchaves/bert-large-portuguese-cased-assin2-entailment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_large_uncased_whole_word_masking_finetuned_squad_ozlemsenel_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_large_uncased_whole_word_masking_finetuned_squad_ozlemsenel_en.md new file mode 100644 index 00000000000000..316cff87b2d429 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_large_uncased_whole_word_masking_finetuned_squad_ozlemsenel_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_large_uncased_whole_word_masking_finetuned_squad_ozlemsenel BertForQuestionAnswering from ozlemsenel +author: John Snow Labs +name: bert_large_uncased_whole_word_masking_finetuned_squad_ozlemsenel +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_whole_word_masking_finetuned_squad_ozlemsenel` is a English model originally trained by ozlemsenel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_finetuned_squad_ozlemsenel_en_5.5.0_3.0_1727049284300.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_finetuned_squad_ozlemsenel_en_5.5.0_3.0_1727049284300.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_large_uncased_whole_word_masking_finetuned_squad_ozlemsenel","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_large_uncased_whole_word_masking_finetuned_squad_ozlemsenel", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_whole_word_masking_finetuned_squad_ozlemsenel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ozlemsenel/bert-large-uncased-whole-word-masking-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_racial_bias_model_80_0k_samples_fold_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_racial_bias_model_80_0k_samples_fold_2_pipeline_en.md new file mode 100644 index 00000000000000..fa1c92678248af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_racial_bias_model_80_0k_samples_fold_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_racial_bias_model_80_0k_samples_fold_2_pipeline pipeline DistilBertForSequenceClassification from jamnik99 +author: John Snow Labs +name: bert_racial_bias_model_80_0k_samples_fold_2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_racial_bias_model_80_0k_samples_fold_2_pipeline` is a English model originally trained by jamnik99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_racial_bias_model_80_0k_samples_fold_2_pipeline_en_5.5.0_3.0_1727020992605.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_racial_bias_model_80_0k_samples_fold_2_pipeline_en_5.5.0_3.0_1727020992605.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_racial_bias_model_80_0k_samples_fold_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_racial_bias_model_80_0k_samples_fold_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_racial_bias_model_80_0k_samples_fold_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jamnik99/BERT-racial_bias_model_80.0K_samples_fold_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_en.md new file mode 100644 index 00000000000000..9d1801a0341105 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_en_5.5.0_3.0_1727012953910.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_en_5.5.0_3.0_1727012953910.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_vllm-gemma2b-llmOversight-0.5-noDropSus_6 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_pipeline_en.md new file mode 100644 index 00000000000000..fdf51ca5d345e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_pipeline_en_5.5.0_3.0_1727012966303.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_pipeline_en_5.5.0_3.0_1727012966303.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_vllm_gemma2b_llmoversight_0_5_nodropsus_6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_vllm-gemma2b-llmOversight-0.5-noDropSus_6 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bertmodel_thestriker117_en.md b/docs/_posts/ahmedlone127/2024-09-22-bertmodel_thestriker117_en.md new file mode 100644 index 00000000000000..28700d63d9f815 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bertmodel_thestriker117_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bertmodel_thestriker117 DistilBertForSequenceClassification from thestriker117 +author: John Snow Labs +name: bertmodel_thestriker117 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertmodel_thestriker117` is a English model originally trained by thestriker117. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertmodel_thestriker117_en_5.5.0_3.0_1727020685521.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertmodel_thestriker117_en_5.5.0_3.0_1727020685521.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bertmodel_thestriker117","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bertmodel_thestriker117", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertmodel_thestriker117| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thestriker117/bertModel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bertmodel_thestriker117_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bertmodel_thestriker117_pipeline_en.md new file mode 100644 index 00000000000000..d778964da3f8a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bertmodel_thestriker117_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bertmodel_thestriker117_pipeline pipeline DistilBertForSequenceClassification from thestriker117 +author: John Snow Labs +name: bertmodel_thestriker117_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertmodel_thestriker117_pipeline` is a English model originally trained by thestriker117. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertmodel_thestriker117_pipeline_en_5.5.0_3.0_1727020697845.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertmodel_thestriker117_pipeline_en_5.5.0_3.0_1727020697845.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bertmodel_thestriker117_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bertmodel_thestriker117_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertmodel_thestriker117_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thestriker117/bertModel + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-binary_stock_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-22-binary_stock_classifier_en.md new file mode 100644 index 00000000000000..1fa84aefede819 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-binary_stock_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English binary_stock_classifier DistilBertForSequenceClassification from hkufyp2024 +author: John Snow Labs +name: binary_stock_classifier +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`binary_stock_classifier` is a English model originally trained by hkufyp2024. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/binary_stock_classifier_en_5.5.0_3.0_1727035396440.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/binary_stock_classifier_en_5.5.0_3.0_1727035396440.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("binary_stock_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("binary_stock_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|binary_stock_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hkufyp2024/binary-stock-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-binary_stock_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-binary_stock_classifier_pipeline_en.md new file mode 100644 index 00000000000000..2a4dcdefff4eae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-binary_stock_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English binary_stock_classifier_pipeline pipeline DistilBertForSequenceClassification from hkufyp2024 +author: John Snow Labs +name: binary_stock_classifier_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`binary_stock_classifier_pipeline` is a English model originally trained by hkufyp2024. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/binary_stock_classifier_pipeline_en_5.5.0_3.0_1727035409774.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/binary_stock_classifier_pipeline_en_5.5.0_3.0_1727035409774.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("binary_stock_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("binary_stock_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|binary_stock_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hkufyp2024/binary-stock-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bodo_roberta_base_sentencepiece_mlm_en.md b/docs/_posts/ahmedlone127/2024-09-22-bodo_roberta_base_sentencepiece_mlm_en.md new file mode 100644 index 00000000000000..69b46a13e7eeea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bodo_roberta_base_sentencepiece_mlm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bodo_roberta_base_sentencepiece_mlm RoBertaEmbeddings from alayaran +author: John Snow Labs +name: bodo_roberta_base_sentencepiece_mlm +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bodo_roberta_base_sentencepiece_mlm` is a English model originally trained by alayaran. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bodo_roberta_base_sentencepiece_mlm_en_5.5.0_3.0_1726999903050.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bodo_roberta_base_sentencepiece_mlm_en_5.5.0_3.0_1726999903050.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("bodo_roberta_base_sentencepiece_mlm","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("bodo_roberta_base_sentencepiece_mlm","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bodo_roberta_base_sentencepiece_mlm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/alayaran/bodo-roberta-base-sentencepiece-mlm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-brand_classification_20240708_model_2_distilbert_0_980011_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-brand_classification_20240708_model_2_distilbert_0_980011_pipeline_en.md new file mode 100644 index 00000000000000..378ab7d77ac32c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-brand_classification_20240708_model_2_distilbert_0_980011_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English brand_classification_20240708_model_2_distilbert_0_980011_pipeline pipeline DistilBertForSequenceClassification from jointriple +author: John Snow Labs +name: brand_classification_20240708_model_2_distilbert_0_980011_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`brand_classification_20240708_model_2_distilbert_0_980011_pipeline` is a English model originally trained by jointriple. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/brand_classification_20240708_model_2_distilbert_0_980011_pipeline_en_5.5.0_3.0_1727012247491.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/brand_classification_20240708_model_2_distilbert_0_980011_pipeline_en_5.5.0_3.0_1727012247491.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("brand_classification_20240708_model_2_distilbert_0_980011_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("brand_classification_20240708_model_2_distilbert_0_980011_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|brand_classification_20240708_model_2_distilbert_0_980011_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|255.5 MB| + +## References + +https://huggingface.co/jointriple/brand_classification_20240708_model_2_distilbert_0_980011 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_cecilia0409_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_cecilia0409_pipeline_en.md new file mode 100644 index 00000000000000..22e57c92fd6c60 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_eli5_mlm_model_cecilia0409_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_cecilia0409_pipeline pipeline RoBertaEmbeddings from Cecilia0409 +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_cecilia0409_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_cecilia0409_pipeline` is a English model originally trained by Cecilia0409. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_cecilia0409_pipeline_en_5.5.0_3.0_1727000032780.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_cecilia0409_pipeline_en_5.5.0_3.0_1727000032780.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_cecilia0409_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_cecilia0409_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_cecilia0409_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/Cecilia0409/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model2_cippppy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model2_cippppy_pipeline_en.md new file mode 100644 index 00000000000000..bce4c51a74ca4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model2_cippppy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model2_cippppy_pipeline pipeline DistilBertForSequenceClassification from Cippppy +author: John Snow Labs +name: burmese_awesome_model2_cippppy_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model2_cippppy_pipeline` is a English model originally trained by Cippppy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model2_cippppy_pipeline_en_5.5.0_3.0_1727020437904.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model2_cippppy_pipeline_en_5.5.0_3.0_1727020437904.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model2_cippppy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model2_cippppy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model2_cippppy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|238.3 MB| + +## References + +https://huggingface.co/Cippppy/my_awesome_model2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_augmented_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_augmented_en.md new file mode 100644 index 00000000000000..a47a26b8a459e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_augmented_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_augmented DistilBertForSequenceClassification from Shozi +author: John Snow Labs +name: burmese_awesome_model_augmented +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_augmented` is a English model originally trained by Shozi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_augmented_en_5.5.0_3.0_1727033588312.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_augmented_en_5.5.0_3.0_1727033588312.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_augmented","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_augmented", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_augmented| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Shozi/my_awesome_model_augmented \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_fabisor_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_fabisor_en.md new file mode 100644 index 00000000000000..6e3b244fe7ab6a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_fabisor_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_fabisor DistilBertForSequenceClassification from fabisor +author: John Snow Labs +name: burmese_awesome_model_fabisor +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_fabisor` is a English model originally trained by fabisor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_fabisor_en_5.5.0_3.0_1726980113822.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_fabisor_en_5.5.0_3.0_1726980113822.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_fabisor","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_fabisor", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_fabisor| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/fabisor/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_ian_ailex_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_ian_ailex_en.md new file mode 100644 index 00000000000000..3a121b109dd4fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_ian_ailex_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_ian_ailex DistilBertForSequenceClassification from Ian-AILex +author: John Snow Labs +name: burmese_awesome_model_ian_ailex +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_ian_ailex` is a English model originally trained by Ian-AILex. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ian_ailex_en_5.5.0_3.0_1727021011983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ian_ailex_en_5.5.0_3.0_1727021011983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_ian_ailex","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_ian_ailex", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_ian_ailex| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Ian-AILex/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_jimmy77777_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_jimmy77777_pipeline_en.md new file mode 100644 index 00000000000000..8f0db49481f8c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_jimmy77777_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_jimmy77777_pipeline pipeline DistilBertForSequenceClassification from Jimmy77777 +author: John Snow Labs +name: burmese_awesome_model_jimmy77777_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_jimmy77777_pipeline` is a English model originally trained by Jimmy77777. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_jimmy77777_pipeline_en_5.5.0_3.0_1727033385634.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_jimmy77777_pipeline_en_5.5.0_3.0_1727033385634.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_jimmy77777_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_jimmy77777_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_jimmy77777_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Jimmy77777/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_miguelactc27_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_miguelactc27_en.md new file mode 100644 index 00000000000000..6845b0e5dde2e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_miguelactc27_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_miguelactc27 DistilBertForSequenceClassification from miguelactc27 +author: John Snow Labs +name: burmese_awesome_model_miguelactc27 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_miguelactc27` is a English model originally trained by miguelactc27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_miguelactc27_en_5.5.0_3.0_1727012435565.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_miguelactc27_en_5.5.0_3.0_1727012435565.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_miguelactc27","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_miguelactc27", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_miguelactc27| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/miguelactc27/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_miguelactc27_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_miguelactc27_pipeline_en.md new file mode 100644 index 00000000000000..9fb5981e2e5d3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_miguelactc27_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_miguelactc27_pipeline pipeline DistilBertForSequenceClassification from miguelactc27 +author: John Snow Labs +name: burmese_awesome_model_miguelactc27_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_miguelactc27_pipeline` is a English model originally trained by miguelactc27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_miguelactc27_pipeline_en_5.5.0_3.0_1727012448028.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_miguelactc27_pipeline_en_5.5.0_3.0_1727012448028.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_miguelactc27_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_miguelactc27_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_miguelactc27_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/miguelactc27/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_mou11209203_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_mou11209203_en.md new file mode 100644 index 00000000000000..dc5a63b7278f2e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_mou11209203_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_mou11209203 DistilBertForSequenceClassification from Mou11209203 +author: John Snow Labs +name: burmese_awesome_model_mou11209203 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_mou11209203` is a English model originally trained by Mou11209203. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_mou11209203_en_5.5.0_3.0_1727012966134.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_mou11209203_en_5.5.0_3.0_1727012966134.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_mou11209203","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_mou11209203", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_mou11209203| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Mou11209203/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_soilspoon_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_soilspoon_en.md new file mode 100644 index 00000000000000..0363bac41a3263 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_soilspoon_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_soilspoon DistilBertForSequenceClassification from soilSpoon +author: John Snow Labs +name: burmese_awesome_model_soilspoon +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_soilspoon` is a English model originally trained by soilSpoon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_soilspoon_en_5.5.0_3.0_1727012336661.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_soilspoon_en_5.5.0_3.0_1727012336661.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_soilspoon","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_soilspoon", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_soilspoon| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/soilSpoon/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_soilspoon_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_soilspoon_pipeline_en.md new file mode 100644 index 00000000000000..276dd95f819c5e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_soilspoon_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_soilspoon_pipeline pipeline DistilBertForSequenceClassification from soilSpoon +author: John Snow Labs +name: burmese_awesome_model_soilspoon_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_soilspoon_pipeline` is a English model originally trained by soilSpoon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_soilspoon_pipeline_en_5.5.0_3.0_1727012349220.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_soilspoon_pipeline_en_5.5.0_3.0_1727012349220.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_soilspoon_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_soilspoon_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_soilspoon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/soilSpoon/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_wwwjjj_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_wwwjjj_en.md new file mode 100644 index 00000000000000..fe79bb7f33c9f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_wwwjjj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_wwwjjj DistilBertForSequenceClassification from wwwjjj +author: John Snow Labs +name: burmese_awesome_model_wwwjjj +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_wwwjjj` is a English model originally trained by wwwjjj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_wwwjjj_en_5.5.0_3.0_1727035090275.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_wwwjjj_en_5.5.0_3.0_1727035090275.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_wwwjjj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_wwwjjj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_wwwjjj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/wwwjjj/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_wwwjjj_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_wwwjjj_pipeline_en.md new file mode 100644 index 00000000000000..f48e07f1f95a55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_wwwjjj_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_wwwjjj_pipeline pipeline DistilBertForSequenceClassification from wwwjjj +author: John Snow Labs +name: burmese_awesome_model_wwwjjj_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_wwwjjj_pipeline` is a English model originally trained by wwwjjj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_wwwjjj_pipeline_en_5.5.0_3.0_1727035103149.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_wwwjjj_pipeline_en_5.5.0_3.0_1727035103149.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_wwwjjj_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_wwwjjj_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_wwwjjj_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/wwwjjj/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_yjl814_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_yjl814_en.md new file mode 100644 index 00000000000000..1a2e2c6beda46e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_yjl814_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_yjl814 DistilBertForSequenceClassification from YJL814 +author: John Snow Labs +name: burmese_awesome_model_yjl814 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_yjl814` is a English model originally trained by YJL814. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_yjl814_en_5.5.0_3.0_1727020897825.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_yjl814_en_5.5.0_3.0_1727020897825.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_yjl814","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_yjl814", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_yjl814| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/YJL814/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_yjl814_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_yjl814_pipeline_en.md new file mode 100644 index 00000000000000..9d9ad283496d3e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_yjl814_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_yjl814_pipeline pipeline DistilBertForSequenceClassification from YJL814 +author: John Snow Labs +name: burmese_awesome_model_yjl814_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_yjl814_pipeline` is a English model originally trained by YJL814. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_yjl814_pipeline_en_5.5.0_3.0_1727020910119.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_yjl814_pipeline_en_5.5.0_3.0_1727020910119.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_yjl814_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_yjl814_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_yjl814_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/YJL814/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_qa_model_hellfox17_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_qa_model_hellfox17_pipeline_en.md new file mode 100644 index 00000000000000..7491e53bfad285 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_qa_model_hellfox17_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_hellfox17_pipeline pipeline DistilBertForQuestionAnswering from Hellfox17 +author: John Snow Labs +name: burmese_awesome_qa_model_hellfox17_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_hellfox17_pipeline` is a English model originally trained by Hellfox17. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_hellfox17_pipeline_en_5.5.0_3.0_1726963753091.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_hellfox17_pipeline_en_5.5.0_3.0_1726963753091.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_hellfox17_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_hellfox17_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_hellfox17_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Hellfox17/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_bert_question_answering_model5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_bert_question_answering_model5_pipeline_en.md new file mode 100644 index 00000000000000..2e41d7cc3f29cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_bert_question_answering_model5_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_bert_question_answering_model5_pipeline pipeline BertForQuestionAnswering from Ashkh0099 +author: John Snow Labs +name: burmese_bert_question_answering_model5_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_bert_question_answering_model5_pipeline` is a English model originally trained by Ashkh0099. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_bert_question_answering_model5_pipeline_en_5.5.0_3.0_1727039419547.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_bert_question_answering_model5_pipeline_en_5.5.0_3.0_1727039419547.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_bert_question_answering_model5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_bert_question_answering_model5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_bert_question_answering_model5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Ashkh0099/my-bert-question-answering-model5 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_fine_tuned_distilbert_lr_0_001_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_fine_tuned_distilbert_lr_0_001_en.md new file mode 100644 index 00000000000000..bc844ee657ddfe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_fine_tuned_distilbert_lr_0_001_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_fine_tuned_distilbert_lr_0_001 DistilBertForSequenceClassification from Benuehlinger +author: John Snow Labs +name: burmese_fine_tuned_distilbert_lr_0_001 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_fine_tuned_distilbert_lr_0_001` is a English model originally trained by Benuehlinger. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_fine_tuned_distilbert_lr_0_001_en_5.5.0_3.0_1726979997437.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_fine_tuned_distilbert_lr_0_001_en_5.5.0_3.0_1726979997437.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_fine_tuned_distilbert_lr_0_001","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_fine_tuned_distilbert_lr_0_001", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_fine_tuned_distilbert_lr_0_001| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.4 MB| + +## References + +https://huggingface.co/Benuehlinger/my-fine-tuned-distilbert-lr-0.001 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_fine_tuned_distilbert_lr_0_001_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_fine_tuned_distilbert_lr_0_001_pipeline_en.md new file mode 100644 index 00000000000000..c654be6a6839ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_fine_tuned_distilbert_lr_0_001_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_fine_tuned_distilbert_lr_0_001_pipeline pipeline DistilBertForSequenceClassification from Benuehlinger +author: John Snow Labs +name: burmese_fine_tuned_distilbert_lr_0_001_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_fine_tuned_distilbert_lr_0_001_pipeline` is a English model originally trained by Benuehlinger. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_fine_tuned_distilbert_lr_0_001_pipeline_en_5.5.0_3.0_1726980013826.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_fine_tuned_distilbert_lr_0_001_pipeline_en_5.5.0_3.0_1726980013826.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_fine_tuned_distilbert_lr_0_001_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_fine_tuned_distilbert_lr_0_001_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_fine_tuned_distilbert_lr_0_001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Benuehlinger/my-fine-tuned-distilbert-lr-0.001 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_nepal_bhasa_model_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_nepal_bhasa_model_en.md new file mode 100644 index 00000000000000..c3e970cf5ed0d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_nepal_bhasa_model_en.md @@ -0,0 +1,100 @@ +--- +layout: model +title: English burmese_nepal_bhasa_model DistilBertForSequenceClassification from CohleM +author: John Snow Labs +name: burmese_nepal_bhasa_model +date: 2024-09-22 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_nepal_bhasa_model` is a English model originally trained by CohleM. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_nepal_bhasa_model_en_5.5.0_3.0_1727012668739.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_nepal_bhasa_model_en_5.5.0_3.0_1727012668739.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_nepal_bhasa_model","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_nepal_bhasa_model","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_nepal_bhasa_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +References + +References + +https://huggingface.co/CohleM/my_new_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-cat_ner_iwcg_3_en.md b/docs/_posts/ahmedlone127/2024-09-22-cat_ner_iwcg_3_en.md new file mode 100644 index 00000000000000..f2e80a6147df69 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-cat_ner_iwcg_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cat_ner_iwcg_3 XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_ner_iwcg_3 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_ner_iwcg_3` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_ner_iwcg_3_en_5.5.0_3.0_1727019075841.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_ner_iwcg_3_en_5.5.0_3.0_1727019075841.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("cat_ner_iwcg_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("cat_ner_iwcg_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_ner_iwcg_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|430.8 MB| + +## References + +https://huggingface.co/homersimpson/cat-ner-iwcg-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-cat_ner_iwcg_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-cat_ner_iwcg_3_pipeline_en.md new file mode 100644 index 00000000000000..8d5a0ddd61a2ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-cat_ner_iwcg_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cat_ner_iwcg_3_pipeline pipeline XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_ner_iwcg_3_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_ner_iwcg_3_pipeline` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_ner_iwcg_3_pipeline_en_5.5.0_3.0_1727019103263.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_ner_iwcg_3_pipeline_en_5.5.0_3.0_1727019103263.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cat_ner_iwcg_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cat_ner_iwcg_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_ner_iwcg_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|430.9 MB| + +## References + +https://huggingface.co/homersimpson/cat-ner-iwcg-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-cat_ner_iwcg_5_en.md b/docs/_posts/ahmedlone127/2024-09-22-cat_ner_iwcg_5_en.md new file mode 100644 index 00000000000000..90a8f50f19df20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-cat_ner_iwcg_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cat_ner_iwcg_5 XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_ner_iwcg_5 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_ner_iwcg_5` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_ner_iwcg_5_en_5.5.0_3.0_1726970024200.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_ner_iwcg_5_en_5.5.0_3.0_1726970024200.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("cat_ner_iwcg_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("cat_ner_iwcg_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_ner_iwcg_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|430.8 MB| + +## References + +https://huggingface.co/homersimpson/cat-ner-iwcg-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-chinese_extract_bert_en.md b/docs/_posts/ahmedlone127/2024-09-22-chinese_extract_bert_en.md new file mode 100644 index 00000000000000..cd9b6dd56f6d95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-chinese_extract_bert_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English chinese_extract_bert BertForQuestionAnswering from frett +author: John Snow Labs +name: chinese_extract_bert +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chinese_extract_bert` is a English model originally trained by frett. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chinese_extract_bert_en_5.5.0_3.0_1727039824473.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chinese_extract_bert_en_5.5.0_3.0_1727039824473.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("chinese_extract_bert","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("chinese_extract_bert", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chinese_extract_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|381.0 MB| + +## References + +https://huggingface.co/frett/chinese_extract_bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-clasificadormotivomora_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-clasificadormotivomora_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..22755ab50b7fff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-clasificadormotivomora_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English clasificadormotivomora_distilbert_pipeline pipeline DistilBertForSequenceClassification from Arodrigo +author: John Snow Labs +name: clasificadormotivomora_distilbert_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clasificadormotivomora_distilbert_pipeline` is a English model originally trained by Arodrigo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clasificadormotivomora_distilbert_pipeline_en_5.5.0_3.0_1727035506642.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clasificadormotivomora_distilbert_pipeline_en_5.5.0_3.0_1727035506642.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("clasificadormotivomora_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("clasificadormotivomora_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clasificadormotivomora_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Arodrigo/ClasificadorMotivoMora-Distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-classification_en.md b/docs/_posts/ahmedlone127/2024-09-22-classification_en.md new file mode 100644 index 00000000000000..ba4d07657fd8c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English classification DistilBertForSequenceClassification from MaX0214 +author: John Snow Labs +name: classification +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classification` is a English model originally trained by MaX0214. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classification_en_5.5.0_3.0_1727020992972.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classification_en_5.5.0_3.0_1727020992972.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MaX0214/classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-classification_model_en.md b/docs/_posts/ahmedlone127/2024-09-22-classification_model_en.md new file mode 100644 index 00000000000000..055e16a26d4d18 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-classification_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English classification_model RoBertaForSequenceClassification from skelley +author: John Snow Labs +name: classification_model +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classification_model` is a English model originally trained by skelley. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classification_model_en_5.5.0_3.0_1727017445980.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classification_model_en_5.5.0_3.0_1727017445980.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("classification_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("classification_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classification_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|421.5 MB| + +## References + +https://huggingface.co/skelley/classification_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-classification_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-classification_model_pipeline_en.md new file mode 100644 index 00000000000000..d5e1510a9a0ad0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-classification_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English classification_model_pipeline pipeline RoBertaForSequenceClassification from skelley +author: John Snow Labs +name: classification_model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classification_model_pipeline` is a English model originally trained by skelley. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classification_model_pipeline_en_5.5.0_3.0_1727017480089.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classification_model_pipeline_en_5.5.0_3.0_1727017480089.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("classification_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("classification_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classification_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|421.5 MB| + +## References + +https://huggingface.co/skelley/classification_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-classification_pipeline_en.md new file mode 100644 index 00000000000000..9634399f8e91b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English classification_pipeline pipeline DistilBertForSequenceClassification from MaX0214 +author: John Snow Labs +name: classification_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classification_pipeline` is a English model originally trained by MaX0214. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classification_pipeline_en_5.5.0_3.0_1727021006159.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classification_pipeline_en_5.5.0_3.0_1727021006159.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MaX0214/classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-classifiereutoplevelroberta_en.md b/docs/_posts/ahmedlone127/2024-09-22-classifiereutoplevelroberta_en.md new file mode 100644 index 00000000000000..3423598e3b1cc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-classifiereutoplevelroberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English classifiereutoplevelroberta RoBertaForSequenceClassification from gianma +author: John Snow Labs +name: classifiereutoplevelroberta +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classifiereutoplevelroberta` is a English model originally trained by gianma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classifiereutoplevelroberta_en_5.5.0_3.0_1727037880863.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classifiereutoplevelroberta_en_5.5.0_3.0_1727037880863.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("classifiereutoplevelroberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("classifiereutoplevelroberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classifiereutoplevelroberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/gianma/classifierEUtopLevelRoberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-clinicalbertqa_200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-clinicalbertqa_200_pipeline_en.md new file mode 100644 index 00000000000000..5ed5e55aa56721 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-clinicalbertqa_200_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English clinicalbertqa_200_pipeline pipeline BertForQuestionAnswering from lanzv +author: John Snow Labs +name: clinicalbertqa_200_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinicalbertqa_200_pipeline` is a English model originally trained by lanzv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinicalbertqa_200_pipeline_en_5.5.0_3.0_1727049214498.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinicalbertqa_200_pipeline_en_5.5.0_3.0_1727049214498.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("clinicalbertqa_200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("clinicalbertqa_200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinicalbertqa_200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.3 MB| + +## References + +https://huggingface.co/lanzv/ClinicalBERTQA_200 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-coha1940s_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-coha1940s_pipeline_en.md new file mode 100644 index 00000000000000..2e0dd6b43be444 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-coha1940s_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English coha1940s_pipeline pipeline RoBertaEmbeddings from simonmun +author: John Snow Labs +name: coha1940s_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`coha1940s_pipeline` is a English model originally trained by simonmun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/coha1940s_pipeline_en_5.5.0_3.0_1726999720356.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/coha1940s_pipeline_en_5.5.0_3.0_1726999720356.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("coha1940s_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("coha1940s_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|coha1940s_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.8 MB| + +## References + +https://huggingface.co/simonmun/COHA1940s + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_en.md b/docs/_posts/ahmedlone127/2024-09-22-correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_en.md new file mode 100644 index 00000000000000..8446985d2dd722 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21 BertForTokenClassification from ali2066 +author: John Snow Labs +name: correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_en_5.5.0_3.0_1727031327438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_en_5.5.0_3.0_1727031327438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ali2066/correct_BERT_token_itr0_0.0001_editorials_01_03_2022-15_50_21 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_pipeline_en.md new file mode 100644 index 00000000000000..59344cbc765962 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_pipeline pipeline BertForTokenClassification from ali2066 +author: John Snow Labs +name: correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_pipeline` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_pipeline_en_5.5.0_3.0_1727031347646.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_pipeline_en_5.5.0_3.0_1727031347646.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|correct_bert_token_itr0_0_0001_editorials_01_03_2022_15_50_21_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ali2066/correct_BERT_token_itr0_0.0001_editorials_01_03_2022-15_50_21 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-covid_19_vaccination_tweet_stance_en.md b/docs/_posts/ahmedlone127/2024-09-22-covid_19_vaccination_tweet_stance_en.md new file mode 100644 index 00000000000000..4d3b8a1c2fd070 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-covid_19_vaccination_tweet_stance_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English covid_19_vaccination_tweet_stance BertForSequenceClassification from seantw +author: John Snow Labs +name: covid_19_vaccination_tweet_stance +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`covid_19_vaccination_tweet_stance` is a English model originally trained by seantw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/covid_19_vaccination_tweet_stance_en_5.5.0_3.0_1727007264751.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/covid_19_vaccination_tweet_stance_en_5.5.0_3.0_1727007264751.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("covid_19_vaccination_tweet_stance","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("covid_19_vaccination_tweet_stance", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|covid_19_vaccination_tweet_stance| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/seantw/covid-19-vaccination-tweet-stance \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-cross_encoder_roberta_wwm_ext_v1_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-22-cross_encoder_roberta_wwm_ext_v1_pipeline_zh.md new file mode 100644 index 00000000000000..6de659d347088d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-cross_encoder_roberta_wwm_ext_v1_pipeline_zh.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Chinese cross_encoder_roberta_wwm_ext_v1_pipeline pipeline BertForSequenceClassification from tuhailong +author: John Snow Labs +name: cross_encoder_roberta_wwm_ext_v1_pipeline +date: 2024-09-22 +tags: [zh, open_source, pipeline, onnx] +task: Text Classification +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cross_encoder_roberta_wwm_ext_v1_pipeline` is a Chinese model originally trained by tuhailong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cross_encoder_roberta_wwm_ext_v1_pipeline_zh_5.5.0_3.0_1727034555787.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cross_encoder_roberta_wwm_ext_v1_pipeline_zh_5.5.0_3.0_1727034555787.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cross_encoder_roberta_wwm_ext_v1_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cross_encoder_roberta_wwm_ext_v1_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cross_encoder_roberta_wwm_ext_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|383.2 MB| + +## References + +https://huggingface.co/tuhailong/cross_encoder_roberta-wwm-ext_v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-cross_encoder_roberta_wwm_ext_v1_zh.md b/docs/_posts/ahmedlone127/2024-09-22-cross_encoder_roberta_wwm_ext_v1_zh.md new file mode 100644 index 00000000000000..15367ebdeceaad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-cross_encoder_roberta_wwm_ext_v1_zh.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Chinese cross_encoder_roberta_wwm_ext_v1 BertForSequenceClassification from tuhailong +author: John Snow Labs +name: cross_encoder_roberta_wwm_ext_v1 +date: 2024-09-22 +tags: [zh, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cross_encoder_roberta_wwm_ext_v1` is a Chinese model originally trained by tuhailong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cross_encoder_roberta_wwm_ext_v1_zh_5.5.0_3.0_1727034536317.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cross_encoder_roberta_wwm_ext_v1_zh_5.5.0_3.0_1727034536317.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("cross_encoder_roberta_wwm_ext_v1","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("cross_encoder_roberta_wwm_ext_v1", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cross_encoder_roberta_wwm_ext_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|zh| +|Size:|383.2 MB| + +## References + +https://huggingface.co/tuhailong/cross_encoder_roberta-wwm-ext_v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-cross_encoder_roberta_wwm_ext_v2_zh.md b/docs/_posts/ahmedlone127/2024-09-22-cross_encoder_roberta_wwm_ext_v2_zh.md new file mode 100644 index 00000000000000..02046f81444798 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-cross_encoder_roberta_wwm_ext_v2_zh.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Chinese cross_encoder_roberta_wwm_ext_v2 BertForSequenceClassification from tuhailong +author: John Snow Labs +name: cross_encoder_roberta_wwm_ext_v2 +date: 2024-09-22 +tags: [zh, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cross_encoder_roberta_wwm_ext_v2` is a Chinese model originally trained by tuhailong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cross_encoder_roberta_wwm_ext_v2_zh_5.5.0_3.0_1726988810496.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cross_encoder_roberta_wwm_ext_v2_zh_5.5.0_3.0_1726988810496.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("cross_encoder_roberta_wwm_ext_v2","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("cross_encoder_roberta_wwm_ext_v2", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cross_encoder_roberta_wwm_ext_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|zh| +|Size:|383.2 MB| + +## References + +https://huggingface.co/tuhailong/cross_encoder_roberta-wwm-ext_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-crossencoder_airline_refine_en.md b/docs/_posts/ahmedlone127/2024-09-22-crossencoder_airline_refine_en.md new file mode 100644 index 00000000000000..ac61301b332383 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-crossencoder_airline_refine_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English crossencoder_airline_refine RoBertaForSequenceClassification from srmishra +author: John Snow Labs +name: crossencoder_airline_refine +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`crossencoder_airline_refine` is a English model originally trained by srmishra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/crossencoder_airline_refine_en_5.5.0_3.0_1727017385562.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/crossencoder_airline_refine_en_5.5.0_3.0_1727017385562.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("crossencoder_airline_refine","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("crossencoder_airline_refine", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|crossencoder_airline_refine| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/srmishra/crossencoder-airline-refine \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-custom_model_tweets_sentiment_en.md b/docs/_posts/ahmedlone127/2024-09-22-custom_model_tweets_sentiment_en.md new file mode 100644 index 00000000000000..127b611c8fb4a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-custom_model_tweets_sentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English custom_model_tweets_sentiment DistilBertForSequenceClassification from Doukan +author: John Snow Labs +name: custom_model_tweets_sentiment +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`custom_model_tweets_sentiment` is a English model originally trained by Doukan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/custom_model_tweets_sentiment_en_5.5.0_3.0_1727012342797.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/custom_model_tweets_sentiment_en_5.5.0_3.0_1727012342797.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("custom_model_tweets_sentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("custom_model_tweets_sentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|custom_model_tweets_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Doukan/custom-model-tweets-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-cv9_special_batch12_lr6_small_id.md b/docs/_posts/ahmedlone127/2024-09-22-cv9_special_batch12_lr6_small_id.md new file mode 100644 index 00000000000000..b798a3c8b3f5c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-cv9_special_batch12_lr6_small_id.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Indonesian cv9_special_batch12_lr6_small WhisperForCTC from TheRains +author: John Snow Labs +name: cv9_special_batch12_lr6_small +date: 2024-09-22 +tags: [id, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cv9_special_batch12_lr6_small` is a Indonesian model originally trained by TheRains. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cv9_special_batch12_lr6_small_id_5.5.0_3.0_1727024528786.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cv9_special_batch12_lr6_small_id_5.5.0_3.0_1727024528786.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("cv9_special_batch12_lr6_small","id") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("cv9_special_batch12_lr6_small", "id") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cv9_special_batch12_lr6_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|id| +|Size:|1.7 GB| + +## References + +https://huggingface.co/TheRains/cv9-special-batch12-lr6-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-delete_only_filtering_hausa_v2_en.md b/docs/_posts/ahmedlone127/2024-09-22-delete_only_filtering_hausa_v2_en.md new file mode 100644 index 00000000000000..32fdd2eeb2332d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-delete_only_filtering_hausa_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English delete_only_filtering_hausa_v2 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: delete_only_filtering_hausa_v2 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`delete_only_filtering_hausa_v2` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/delete_only_filtering_hausa_v2_en_5.5.0_3.0_1727018796486.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/delete_only_filtering_hausa_v2_en_5.5.0_3.0_1727018796486.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("delete_only_filtering_hausa_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("delete_only_filtering_hausa_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|delete_only_filtering_hausa_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/delete_only_filtering_hausa_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-delete_only_filtering_hausa_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-delete_only_filtering_hausa_v2_pipeline_en.md new file mode 100644 index 00000000000000..020cb8bd0de0d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-delete_only_filtering_hausa_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English delete_only_filtering_hausa_v2_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: delete_only_filtering_hausa_v2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`delete_only_filtering_hausa_v2_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/delete_only_filtering_hausa_v2_pipeline_en_5.5.0_3.0_1727018842469.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/delete_only_filtering_hausa_v2_pipeline_en_5.5.0_3.0_1727018842469.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("delete_only_filtering_hausa_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("delete_only_filtering_hausa_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|delete_only_filtering_hausa_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/delete_only_filtering_hausa_v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-disaster_tweet_4_en.md b/docs/_posts/ahmedlone127/2024-09-22-disaster_tweet_4_en.md new file mode 100644 index 00000000000000..ece39d1b9c11d4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-disaster_tweet_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English disaster_tweet_4 RoBertaForSequenceClassification from aellxx +author: John Snow Labs +name: disaster_tweet_4 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`disaster_tweet_4` is a English model originally trained by aellxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/disaster_tweet_4_en_5.5.0_3.0_1727027012936.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/disaster_tweet_4_en_5.5.0_3.0_1727027012936.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("disaster_tweet_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("disaster_tweet_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|disaster_tweet_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/aellxx/disaster-tweet-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distbert_cpcd_en.md b/docs/_posts/ahmedlone127/2024-09-22-distbert_cpcd_en.md new file mode 100644 index 00000000000000..88099c15fcd1f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distbert_cpcd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distbert_cpcd DistilBertForSequenceClassification from jnwnlee +author: John Snow Labs +name: distbert_cpcd +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distbert_cpcd` is a English model originally trained by jnwnlee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distbert_cpcd_en_5.5.0_3.0_1727035507773.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distbert_cpcd_en_5.5.0_3.0_1727035507773.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distbert_cpcd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distbert_cpcd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distbert_cpcd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jnwnlee/distbert_cpcd \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distil_multilingual_cased_fire_classification_silvanus_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-22-distil_multilingual_cased_fire_classification_silvanus_pipeline_xx.md new file mode 100644 index 00000000000000..351be8ce3ba058 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distil_multilingual_cased_fire_classification_silvanus_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual distil_multilingual_cased_fire_classification_silvanus_pipeline pipeline DistilBertForSequenceClassification from rollerhafeezh-amikom +author: John Snow Labs +name: distil_multilingual_cased_fire_classification_silvanus_pipeline +date: 2024-09-22 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distil_multilingual_cased_fire_classification_silvanus_pipeline` is a Multilingual model originally trained by rollerhafeezh-amikom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distil_multilingual_cased_fire_classification_silvanus_pipeline_xx_5.5.0_3.0_1727020800633.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distil_multilingual_cased_fire_classification_silvanus_pipeline_xx_5.5.0_3.0_1727020800633.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distil_multilingual_cased_fire_classification_silvanus_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distil_multilingual_cased_fire_classification_silvanus_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distil_multilingual_cased_fire_classification_silvanus_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|507.6 MB| + +## References + +https://huggingface.co/rollerhafeezh-amikom/distil-multilingual-cased-fire-classification-silvanus + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distil_multilingual_cased_fire_classification_silvanus_xx.md b/docs/_posts/ahmedlone127/2024-09-22-distil_multilingual_cased_fire_classification_silvanus_xx.md new file mode 100644 index 00000000000000..986dba1210f7e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distil_multilingual_cased_fire_classification_silvanus_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual distil_multilingual_cased_fire_classification_silvanus DistilBertForSequenceClassification from rollerhafeezh-amikom +author: John Snow Labs +name: distil_multilingual_cased_fire_classification_silvanus +date: 2024-09-22 +tags: [xx, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distil_multilingual_cased_fire_classification_silvanus` is a Multilingual model originally trained by rollerhafeezh-amikom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distil_multilingual_cased_fire_classification_silvanus_xx_5.5.0_3.0_1727020778106.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distil_multilingual_cased_fire_classification_silvanus_xx_5.5.0_3.0_1727020778106.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distil_multilingual_cased_fire_classification_silvanus","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distil_multilingual_cased_fire_classification_silvanus", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distil_multilingual_cased_fire_classification_silvanus| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|507.6 MB| + +## References + +https://huggingface.co/rollerhafeezh-amikom/distil-multilingual-cased-fire-classification-silvanus \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert2_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert2_en.md new file mode 100644 index 00000000000000..ae04975b3918d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert2 DistilBertForSequenceClassification from deptage +author: John Snow Labs +name: distilbert2 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert2` is a English model originally trained by deptage. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert2_en_5.5.0_3.0_1727020768980.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert2_en_5.5.0_3.0_1727020768980.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/deptage/distilbert2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert2_pipeline_en.md new file mode 100644 index 00000000000000..69e5b1df308b9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert2_pipeline pipeline DistilBertForSequenceClassification from deptage +author: John Snow Labs +name: distilbert2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert2_pipeline` is a English model originally trained by deptage. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert2_pipeline_en_5.5.0_3.0_1727020780976.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert2_pipeline_en_5.5.0_3.0_1727020780976.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/deptage/distilbert2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_07_3_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_07_3_en.md new file mode 100644 index 00000000000000..fac85b5c69f336 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_07_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_07_3 DistilBertForSequenceClassification from KalaiselvanD +author: John Snow Labs +name: distilbert_07_3 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_07_3` is a English model originally trained by KalaiselvanD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_07_3_en_5.5.0_3.0_1727033245241.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_07_3_en_5.5.0_3.0_1727033245241.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_07_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_07_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_07_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KalaiselvanD/distilbert_07_3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_agnews_padding70model_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_agnews_padding70model_en.md new file mode 100644 index 00000000000000..703fff9133d6cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_agnews_padding70model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_agnews_padding70model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_agnews_padding70model +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_agnews_padding70model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding70model_en_5.5.0_3.0_1727020873003.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding70model_en_5.5.0_3.0_1727020873003.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_agnews_padding70model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_agnews_padding70model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_agnews_padding70model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_agnews_padding70model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_agnews_padding70model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_agnews_padding70model_pipeline_en.md new file mode 100644 index 00000000000000..eadd7e2ed3d491 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_agnews_padding70model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_agnews_padding70model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_agnews_padding70model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_agnews_padding70model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding70model_pipeline_en_5.5.0_3.0_1727020886459.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding70model_pipeline_en_5.5.0_3.0_1727020886459.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_agnews_padding70model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_agnews_padding70model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_agnews_padding70model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_agnews_padding70model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_cased_sst2_ft_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_cased_sst2_ft_en.md new file mode 100644 index 00000000000000..109e7583cc4e23 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_cased_sst2_ft_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_cased_sst2_ft DistilBertForSequenceClassification from EgehanEralp +author: John Snow Labs +name: distilbert_base_cased_sst2_ft +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_sst2_ft` is a English model originally trained by EgehanEralp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_sst2_ft_en_5.5.0_3.0_1726980633900.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_sst2_ft_en_5.5.0_3.0_1726980633900.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_cased_sst2_ft","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_cased_sst2_ft", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_sst2_ft| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/EgehanEralp/distilbert-base-cased-sst2-ft \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_en.md new file mode 100644 index 00000000000000..c88da951f58156 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_en.md @@ -0,0 +1,100 @@ +--- +layout: model +title: English distilbert_base DistilBertForSequenceClassification from zonghaoyang +author: John Snow Labs +name: distilbert_base +date: 2024-09-22 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base` is a English model originally trained by zonghaoyang. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_en_5.5.0_3.0_1727012755569.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_en_5.5.0_3.0_1727012755569.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +References + +References + +https://huggingface.co/zonghaoyang/DistilBERT-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_finetuned_imdb_sentiment_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_finetuned_imdb_sentiment_en.md new file mode 100644 index 00000000000000..32249b127177a3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_finetuned_imdb_sentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_finetuned_imdb_sentiment DistilBertForSequenceClassification from lyrisha +author: John Snow Labs +name: distilbert_base_finetuned_imdb_sentiment +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_finetuned_imdb_sentiment` is a English model originally trained by lyrisha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_finetuned_imdb_sentiment_en_5.5.0_3.0_1727033355896.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_finetuned_imdb_sentiment_en_5.5.0_3.0_1727033355896.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_finetuned_imdb_sentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_finetuned_imdb_sentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_finetuned_imdb_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lyrisha/distilbert-base-finetuned-imdb-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_multilingual_cased_hate_speech_ben_hin_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_multilingual_cased_hate_speech_ben_hin_pipeline_xx.md new file mode 100644 index 00000000000000..ee3cc0a36fc5ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_multilingual_cased_hate_speech_ben_hin_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual distilbert_base_multilingual_cased_hate_speech_ben_hin_pipeline pipeline DistilBertForSequenceClassification from kingshukroy +author: John Snow Labs +name: distilbert_base_multilingual_cased_hate_speech_ben_hin_pipeline +date: 2024-09-22 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_multilingual_cased_hate_speech_ben_hin_pipeline` is a Multilingual model originally trained by kingshukroy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_hate_speech_ben_hin_pipeline_xx_5.5.0_3.0_1727012389617.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_hate_speech_ben_hin_pipeline_xx_5.5.0_3.0_1727012389617.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_multilingual_cased_hate_speech_ben_hin_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_multilingual_cased_hate_speech_ben_hin_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_multilingual_cased_hate_speech_ben_hin_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|507.6 MB| + +## References + +https://huggingface.co/kingshukroy/distilbert-base-multilingual-cased-hate-speech-ben-hin + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_multilingual_cased_hate_speech_ben_hin_xx.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_multilingual_cased_hate_speech_ben_hin_xx.md new file mode 100644 index 00000000000000..85f4f923cfae02 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_multilingual_cased_hate_speech_ben_hin_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual distilbert_base_multilingual_cased_hate_speech_ben_hin DistilBertForSequenceClassification from kingshukroy +author: John Snow Labs +name: distilbert_base_multilingual_cased_hate_speech_ben_hin +date: 2024-09-22 +tags: [xx, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_multilingual_cased_hate_speech_ben_hin` is a Multilingual model originally trained by kingshukroy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_hate_speech_ben_hin_xx_5.5.0_3.0_1727012366945.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_hate_speech_ben_hin_xx_5.5.0_3.0_1727012366945.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_multilingual_cased_hate_speech_ben_hin","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_multilingual_cased_hate_speech_ben_hin", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_multilingual_cased_hate_speech_ben_hin| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|507.6 MB| + +## References + +https://huggingface.co/kingshukroy/distilbert-base-multilingual-cased-hate-speech-ben-hin \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_pipeline_en.md new file mode 100644 index 00000000000000..883af245658092 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_pipeline_en.md @@ -0,0 +1,72 @@ +--- +layout: model +title: English distilbert_base_pipeline pipeline DistilBertForQuestionAnswering from KarthikAlagarsamy +author: John Snow Labs +name: distilbert_base_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_pipeline` is a English model originally trained by KarthikAlagarsamy. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_pipeline_en_5.5.0_3.0_1727012767605.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_pipeline_en_5.5.0_3.0_1727012767605.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +pipeline = PretrainedPipeline("distilbert_base_pipeline", lang = "en") +annotations = pipeline.transform(df) +``` +```scala +val pipeline = new PretrainedPipeline("distilbert_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +References + +https://huggingface.co/KarthikAlagarsamy/distilbert_base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_3epoch5_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_3epoch5_en.md new file mode 100644 index 00000000000000..d3d20921f55bed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_3epoch5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_3epoch5 DistilBertForSequenceClassification from dianamihalache27 +author: John Snow Labs +name: distilbert_base_uncased_3epoch5 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_3epoch5` is a English model originally trained by dianamihalache27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_3epoch5_en_5.5.0_3.0_1727035583979.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_3epoch5_en_5.5.0_3.0_1727035583979.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_3epoch5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_3epoch5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_3epoch5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dianamihalache27/distilbert-base-uncased_3epoch5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_ahc_25000_1683475188_540369_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_ahc_25000_1683475188_540369_en.md new file mode 100644 index 00000000000000..cc7ea63c5f4662 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_ahc_25000_1683475188_540369_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_ahc_25000_1683475188_540369 DistilBertForSequenceClassification from alvingogo +author: John Snow Labs +name: distilbert_base_uncased_ahc_25000_1683475188_540369 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_ahc_25000_1683475188_540369` is a English model originally trained by alvingogo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_ahc_25000_1683475188_540369_en_5.5.0_3.0_1727033492236.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_ahc_25000_1683475188_540369_en_5.5.0_3.0_1727033492236.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_ahc_25000_1683475188_540369","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_ahc_25000_1683475188_540369", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_ahc_25000_1683475188_540369| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/alvingogo/distilbert-base-uncased_AHC_25000_1683475188.540369 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_ahc_25000_1683475188_540369_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_ahc_25000_1683475188_540369_pipeline_en.md new file mode 100644 index 00000000000000..c44839eaa86064 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_ahc_25000_1683475188_540369_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_ahc_25000_1683475188_540369_pipeline pipeline DistilBertForSequenceClassification from alvingogo +author: John Snow Labs +name: distilbert_base_uncased_ahc_25000_1683475188_540369_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_ahc_25000_1683475188_540369_pipeline` is a English model originally trained by alvingogo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_ahc_25000_1683475188_540369_pipeline_en_5.5.0_3.0_1727033507107.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_ahc_25000_1683475188_540369_pipeline_en_5.5.0_3.0_1727033507107.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_ahc_25000_1683475188_540369_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_ahc_25000_1683475188_540369_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_ahc_25000_1683475188_540369_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/alvingogo/distilbert-base-uncased_AHC_25000_1683475188.540369 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_en.md new file mode 100644 index 00000000000000..90b780ecab3320 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_en_5.5.0_3.0_1727033813396.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_en_5.5.0_3.0_1727033813396.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_banking_zphr_0st72_ut72ut1_plain_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_cext_mypersonality_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_cext_mypersonality_en.md new file mode 100644 index 00000000000000..57249a679298ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_cext_mypersonality_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_cext_mypersonality DistilBertForSequenceClassification from holistic-ai +author: John Snow Labs +name: distilbert_base_uncased_cext_mypersonality +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_cext_mypersonality` is a English model originally trained by holistic-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_cext_mypersonality_en_5.5.0_3.0_1727020952708.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_cext_mypersonality_en_5.5.0_3.0_1727020952708.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_cext_mypersonality","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_cext_mypersonality", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_cext_mypersonality| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/holistic-ai/distilbert-base-uncased_cEXT_mypersonality \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_cext_mypersonality_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_cext_mypersonality_pipeline_en.md new file mode 100644 index 00000000000000..10fc08cf415698 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_cext_mypersonality_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_cext_mypersonality_pipeline pipeline DistilBertForSequenceClassification from holistic-ai +author: John Snow Labs +name: distilbert_base_uncased_cext_mypersonality_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_cext_mypersonality_pipeline` is a English model originally trained by holistic-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_cext_mypersonality_pipeline_en_5.5.0_3.0_1727020964404.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_cext_mypersonality_pipeline_en_5.5.0_3.0_1727020964404.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_cext_mypersonality_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_cext_mypersonality_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_cext_mypersonality_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/holistic-ai/distilbert-base-uncased_cEXT_mypersonality + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_copn_mypersonality_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_copn_mypersonality_en.md new file mode 100644 index 00000000000000..2d122d803df1d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_copn_mypersonality_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_copn_mypersonality DistilBertForSequenceClassification from holistic-ai +author: John Snow Labs +name: distilbert_base_uncased_copn_mypersonality +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_copn_mypersonality` is a English model originally trained by holistic-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_copn_mypersonality_en_5.5.0_3.0_1727020504713.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_copn_mypersonality_en_5.5.0_3.0_1727020504713.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_copn_mypersonality","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_copn_mypersonality", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_copn_mypersonality| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/holistic-ai/distilbert-base-uncased_cOPN_mypersonality \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_copn_mypersonality_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_copn_mypersonality_pipeline_en.md new file mode 100644 index 00000000000000..18cb28b8a671ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_copn_mypersonality_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_copn_mypersonality_pipeline pipeline DistilBertForSequenceClassification from holistic-ai +author: John Snow Labs +name: distilbert_base_uncased_copn_mypersonality_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_copn_mypersonality_pipeline` is a English model originally trained by holistic-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_copn_mypersonality_pipeline_en_5.5.0_3.0_1727020517033.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_copn_mypersonality_pipeline_en_5.5.0_3.0_1727020517033.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_copn_mypersonality_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_copn_mypersonality_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_copn_mypersonality_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/holistic-ai/distilbert-base-uncased_cOPN_mypersonality + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_distilled_clinc_dro14_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_distilled_clinc_dro14_en.md new file mode 100644 index 00000000000000..ee42e4b81070ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_distilled_clinc_dro14_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_dro14 DistilBertForSequenceClassification from dro14 +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_dro14 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_dro14` is a English model originally trained by dro14. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_dro14_en_5.5.0_3.0_1726980423303.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_dro14_en_5.5.0_3.0_1726980423303.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_dro14","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_clinc_dro14", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_dro14| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/dro14/distilbert-base-uncased-distilled-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_emotion_ft_0416_qingh001_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_emotion_ft_0416_qingh001_en.md new file mode 100644 index 00000000000000..1f2d625d0a3e5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_emotion_ft_0416_qingh001_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_emotion_ft_0416_qingh001 DistilBertForSequenceClassification from qingh001 +author: John Snow Labs +name: distilbert_base_uncased_emotion_ft_0416_qingh001 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_emotion_ft_0416_qingh001` is a English model originally trained by qingh001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_ft_0416_qingh001_en_5.5.0_3.0_1727033582713.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_ft_0416_qingh001_en_5.5.0_3.0_1727033582713.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_emotion_ft_0416_qingh001","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_emotion_ft_0416_qingh001", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_emotion_ft_0416_qingh001| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/qingh001/distilbert-base-uncased_emotion_ft_0416 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_emotion_ft_0416_qingh001_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_emotion_ft_0416_qingh001_pipeline_en.md new file mode 100644 index 00000000000000..13e5e8c602a33d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_emotion_ft_0416_qingh001_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_emotion_ft_0416_qingh001_pipeline pipeline DistilBertForSequenceClassification from qingh001 +author: John Snow Labs +name: distilbert_base_uncased_emotion_ft_0416_qingh001_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_emotion_ft_0416_qingh001_pipeline` is a English model originally trained by qingh001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_ft_0416_qingh001_pipeline_en_5.5.0_3.0_1727033595627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_ft_0416_qingh001_pipeline_en_5.5.0_3.0_1727033595627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_emotion_ft_0416_qingh001_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_emotion_ft_0416_qingh001_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_emotion_ft_0416_qingh001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/qingh001/distilbert-base-uncased_emotion_ft_0416 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_adl_hw1_b10401015_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_adl_hw1_b10401015_pipeline_en.md new file mode 100644 index 00000000000000..070ceefca4e3e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_adl_hw1_b10401015_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_adl_hw1_b10401015_pipeline pipeline DistilBertForSequenceClassification from b10401015 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_adl_hw1_b10401015_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_adl_hw1_b10401015_pipeline` is a English model originally trained by b10401015. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw1_b10401015_pipeline_en_5.5.0_3.0_1726980399346.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw1_b10401015_pipeline_en_5.5.0_3.0_1726980399346.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_adl_hw1_b10401015_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_adl_hw1_b10401015_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_adl_hw1_b10401015_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/b10401015/distilbert-base-uncased-finetuned-adl_hw1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_jnwulff_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_jnwulff_en.md new file mode 100644 index 00000000000000..46d9ebc7ea6405 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_jnwulff_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_jnwulff DistilBertForSequenceClassification from jnwulff +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_jnwulff +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_jnwulff` is a English model originally trained by jnwulff. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_jnwulff_en_5.5.0_3.0_1727033365345.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_jnwulff_en_5.5.0_3.0_1727033365345.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_jnwulff","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_jnwulff", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_jnwulff| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/jnwulff/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_jnwulff_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_jnwulff_pipeline_en.md new file mode 100644 index 00000000000000..a301feacfd5fe6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_jnwulff_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_jnwulff_pipeline pipeline DistilBertForSequenceClassification from jnwulff +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_jnwulff_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_jnwulff_pipeline` is a English model originally trained by jnwulff. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_jnwulff_pipeline_en_5.5.0_3.0_1727033378583.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_jnwulff_pipeline_en_5.5.0_3.0_1727033378583.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_jnwulff_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_jnwulff_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_jnwulff_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/jnwulff/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_en.md new file mode 100644 index 00000000000000..73184bb96d9b79 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark DistilBertForSequenceClassification from kwkwkwkwpark +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark` is a English model originally trained by kwkwkwkwpark. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_en_5.5.0_3.0_1727033870540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_en_5.5.0_3.0_1727033870540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/kwkwkwkwpark/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_pipeline_en.md new file mode 100644 index 00000000000000..baaf287141ac2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_pipeline pipeline DistilBertForSequenceClassification from kwkwkwkwpark +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_pipeline` is a English model originally trained by kwkwkwkwpark. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_pipeline_en_5.5.0_3.0_1727033883520.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_pipeline_en_5.5.0_3.0_1727033883520.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_kwkwkwkwpark_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/kwkwkwkwpark/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_maybehesham_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_maybehesham_pipeline_en.md new file mode 100644 index 00000000000000..bea8723e06ab6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_maybehesham_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_maybehesham_pipeline pipeline DistilBertForSequenceClassification from MayBeHesham +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_maybehesham_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_maybehesham_pipeline` is a English model originally trained by MayBeHesham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_maybehesham_pipeline_en_5.5.0_3.0_1727012551607.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_maybehesham_pipeline_en_5.5.0_3.0_1727012551607.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_maybehesham_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_maybehesham_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_maybehesham_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/MayBeHesham/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_sudaheng_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_sudaheng_en.md new file mode 100644 index 00000000000000..d817c70515f481 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_sudaheng_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_sudaheng DistilBertForSequenceClassification from sudaheng +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_sudaheng +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_sudaheng` is a English model originally trained by sudaheng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_sudaheng_en_5.5.0_3.0_1727012230175.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_sudaheng_en_5.5.0_3.0_1727012230175.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_sudaheng","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_sudaheng", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_sudaheng| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/sudaheng/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_sudaheng_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_sudaheng_pipeline_en.md new file mode 100644 index 00000000000000..5337bdba93f8b7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_clinc_sudaheng_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_sudaheng_pipeline pipeline DistilBertForSequenceClassification from sudaheng +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_sudaheng_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_sudaheng_pipeline` is a English model originally trained by sudaheng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_sudaheng_pipeline_en_5.5.0_3.0_1727012242828.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_sudaheng_pipeline_en_5.5.0_3.0_1727012242828.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_sudaheng_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_sudaheng_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_sudaheng_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/sudaheng/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_artoriasxv_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_artoriasxv_en.md new file mode 100644 index 00000000000000..e7664996611301 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_artoriasxv_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_artoriasxv DistilBertForSequenceClassification from ArtoriasXV +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_artoriasxv +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_artoriasxv` is a English model originally trained by ArtoriasXV. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_artoriasxv_en_5.5.0_3.0_1727012452925.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_artoriasxv_en_5.5.0_3.0_1727012452925.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_artoriasxv","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_artoriasxv", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_artoriasxv| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ArtoriasXV/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_panzy0524_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_panzy0524_en.md new file mode 100644 index 00000000000000..bb6f3acc53a3ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_panzy0524_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_panzy0524 DistilBertForSequenceClassification from panzy0524 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_panzy0524 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_panzy0524` is a English model originally trained by panzy0524. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_panzy0524_en_5.5.0_3.0_1727012729722.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_panzy0524_en_5.5.0_3.0_1727012729722.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_panzy0524","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_panzy0524", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_panzy0524| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/panzy0524/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_panzy0524_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_panzy0524_pipeline_en.md new file mode 100644 index 00000000000000..025e86a1d5fb5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_panzy0524_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_panzy0524_pipeline pipeline DistilBertForSequenceClassification from panzy0524 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_panzy0524_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_panzy0524_pipeline` is a English model originally trained by panzy0524. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_panzy0524_pipeline_en_5.5.0_3.0_1727012742037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_panzy0524_pipeline_en_5.5.0_3.0_1727012742037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_panzy0524_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_panzy0524_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_panzy0524_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/panzy0524/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_ravikant22_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_ravikant22_en.md new file mode 100644 index 00000000000000..db3d10133dc609 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_ravikant22_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_ravikant22 DistilBertForSequenceClassification from ravikant22 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_ravikant22 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_ravikant22` is a English model originally trained by ravikant22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_ravikant22_en_5.5.0_3.0_1727020664780.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_ravikant22_en_5.5.0_3.0_1727020664780.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_ravikant22","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_ravikant22", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_ravikant22| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ravikant22/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_anamelchor_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_anamelchor_en.md new file mode 100644 index 00000000000000..7d622c98762498 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_anamelchor_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_anamelchor DistilBertForSequenceClassification from anamelchor +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_anamelchor +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_anamelchor` is a English model originally trained by anamelchor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_anamelchor_en_5.5.0_3.0_1727035401399.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_anamelchor_en_5.5.0_3.0_1727035401399.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_anamelchor","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_anamelchor", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_anamelchor| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/anamelchor/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_anamelchor_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_anamelchor_pipeline_en.md new file mode 100644 index 00000000000000..b810348675ac07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_anamelchor_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_anamelchor_pipeline pipeline DistilBertForSequenceClassification from anamelchor +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_anamelchor_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_anamelchor_pipeline` is a English model originally trained by anamelchor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_anamelchor_pipeline_en_5.5.0_3.0_1727035413745.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_anamelchor_pipeline_en_5.5.0_3.0_1727035413745.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_anamelchor_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_anamelchor_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_anamelchor_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/anamelchor/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_beijaflor2024_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_beijaflor2024_pipeline_en.md new file mode 100644 index 00000000000000..06fa2113da5ccd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_beijaflor2024_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_beijaflor2024_pipeline pipeline DistilBertForSequenceClassification from Beijaflor2024 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_beijaflor2024_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_beijaflor2024_pipeline` is a English model originally trained by Beijaflor2024. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_beijaflor2024_pipeline_en_5.5.0_3.0_1727012551851.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_beijaflor2024_pipeline_en_5.5.0_3.0_1727012551851.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_beijaflor2024_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_beijaflor2024_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_beijaflor2024_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Beijaflor2024/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_evernight017_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_evernight017_en.md new file mode 100644 index 00000000000000..cf91fe2418698c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_evernight017_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_evernight017 DistilBertForSequenceClassification from evernight017 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_evernight017 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_evernight017` is a English model originally trained by evernight017. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_evernight017_en_5.5.0_3.0_1727035059104.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_evernight017_en_5.5.0_3.0_1727035059104.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_evernight017","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_evernight017", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_evernight017| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/evernight017/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_gossk_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_gossk_en.md new file mode 100644 index 00000000000000..a0dbe86e43f8c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_gossk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_gossk DistilBertForSequenceClassification from Gossk +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_gossk +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_gossk` is a English model originally trained by Gossk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_gossk_en_5.5.0_3.0_1726980616063.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_gossk_en_5.5.0_3.0_1726980616063.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_gossk","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_gossk", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_gossk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gossk/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_gossk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_gossk_pipeline_en.md new file mode 100644 index 00000000000000..7f6addfe0e0763 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_gossk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_gossk_pipeline pipeline DistilBertForSequenceClassification from Gossk +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_gossk_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_gossk_pipeline` is a English model originally trained by Gossk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_gossk_pipeline_en_5.5.0_3.0_1726980628189.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_gossk_pipeline_en_5.5.0_3.0_1726980628189.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_gossk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_gossk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_gossk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gossk/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_jhtop88_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_jhtop88_en.md new file mode 100644 index 00000000000000..35d8fdc0392fe3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_jhtop88_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jhtop88 DistilBertForSequenceClassification from jhtop88 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jhtop88 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jhtop88` is a English model originally trained by jhtop88. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jhtop88_en_5.5.0_3.0_1727035268264.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jhtop88_en_5.5.0_3.0_1727035268264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jhtop88","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jhtop88", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jhtop88| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jhtop88/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_jhtop88_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_jhtop88_pipeline_en.md new file mode 100644 index 00000000000000..24b8c0fe5cf522 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_jhtop88_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jhtop88_pipeline pipeline DistilBertForSequenceClassification from jhtop88 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jhtop88_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jhtop88_pipeline` is a English model originally trained by jhtop88. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jhtop88_pipeline_en_5.5.0_3.0_1727035281992.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jhtop88_pipeline_en_5.5.0_3.0_1727035281992.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jhtop88_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jhtop88_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jhtop88_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jhtop88/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_onelock_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_onelock_en.md new file mode 100644 index 00000000000000..e1ce2c4db81735 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_onelock_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_onelock DistilBertForSequenceClassification from onelock +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_onelock +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_onelock` is a English model originally trained by onelock. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_onelock_en_5.5.0_3.0_1727035118067.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_onelock_en_5.5.0_3.0_1727035118067.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_onelock","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_onelock", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_onelock| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/onelock/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_onelock_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_onelock_pipeline_en.md new file mode 100644 index 00000000000000..c9516cdfdeecfc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_onelock_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_onelock_pipeline pipeline DistilBertForSequenceClassification from onelock +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_onelock_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_onelock_pipeline` is a English model originally trained by onelock. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_onelock_pipeline_en_5.5.0_3.0_1727035130478.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_onelock_pipeline_en_5.5.0_3.0_1727035130478.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_onelock_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_onelock_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_onelock_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/onelock/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_ouba_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_ouba_en.md new file mode 100644 index 00000000000000..aad4291bf7227a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_ouba_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ouba DistilBertForSequenceClassification from ouba +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ouba +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ouba` is a English model originally trained by ouba. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ouba_en_5.5.0_3.0_1727033817866.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ouba_en_5.5.0_3.0_1727033817866.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ouba","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ouba", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ouba| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ouba/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_overall_4_0_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_overall_4_0_en.md new file mode 100644 index 00000000000000..eb8f378fe5c607 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_overall_4_0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_overall_4_0 DistilBertForSequenceClassification from LeBruse +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_overall_4_0 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_overall_4_0` is a English model originally trained by LeBruse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_overall_4_0_en_5.5.0_3.0_1727020578253.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_overall_4_0_en_5.5.0_3.0_1727020578253.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_overall_4_0","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_overall_4_0", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_overall_4_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LeBruse/distilbert-base-uncased-finetuned-emotion-overall-4.0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_thiago2608santana_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_thiago2608santana_en.md new file mode 100644 index 00000000000000..ff34994668758f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_thiago2608santana_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_thiago2608santana DistilBertForSequenceClassification from thiago2608santana +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_thiago2608santana +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_thiago2608santana` is a English model originally trained by thiago2608santana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_thiago2608santana_en_5.5.0_3.0_1727012426126.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_thiago2608santana_en_5.5.0_3.0_1727012426126.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_thiago2608santana","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_thiago2608santana", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_thiago2608santana| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thiago2608santana/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_thiago2608santana_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_thiago2608santana_pipeline_en.md new file mode 100644 index 00000000000000..c232639e32aaea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_thiago2608santana_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_thiago2608santana_pipeline pipeline DistilBertForSequenceClassification from thiago2608santana +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_thiago2608santana_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_thiago2608santana_pipeline` is a English model originally trained by thiago2608santana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_thiago2608santana_pipeline_en_5.5.0_3.0_1727012438354.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_thiago2608santana_pipeline_en_5.5.0_3.0_1727012438354.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_thiago2608santana_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_thiago2608santana_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_thiago2608santana_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thiago2608santana/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_en.md new file mode 100644 index 00000000000000..648bef82210cd2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_yamaguchi_kota DistilBertForSequenceClassification from yamaguchi-kota +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_yamaguchi_kota +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_yamaguchi_kota` is a English model originally trained by yamaguchi-kota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_en_5.5.0_3.0_1727012343958.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_en_5.5.0_3.0_1727012343958.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_yamaguchi_kota","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_yamaguchi_kota", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_yamaguchi_kota| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yamaguchi-kota/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_yashcfc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_yashcfc_pipeline_en.md new file mode 100644 index 00000000000000..82f9062de5798b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_yashcfc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_yashcfc_pipeline pipeline DistilBertForSequenceClassification from yashcfc +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_yashcfc_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_yashcfc_pipeline` is a English model originally trained by yashcfc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yashcfc_pipeline_en_5.5.0_3.0_1727033153389.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yashcfc_pipeline_en_5.5.0_3.0_1727033153389.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_yashcfc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_yashcfc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_yashcfc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yashcfc/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotions_riukix_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotions_riukix_pipeline_en.md new file mode 100644 index 00000000000000..ff64c87a76b825 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotions_riukix_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotions_riukix_pipeline pipeline DistilBertForSequenceClassification from Riukix +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotions_riukix_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotions_riukix_pipeline` is a English model originally trained by Riukix. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_riukix_pipeline_en_5.5.0_3.0_1726980241955.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_riukix_pipeline_en_5.5.0_3.0_1726980241955.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotions_riukix_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotions_riukix_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotions_riukix_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Riukix/distilbert-base-uncased-finetuned-emotions + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_m_avoid_harm_seler_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_m_avoid_harm_seler_pipeline_en.md new file mode 100644 index 00000000000000..e57f0e18909b89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_m_avoid_harm_seler_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_m_avoid_harm_seler_pipeline pipeline DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_m_avoid_harm_seler_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_m_avoid_harm_seler_pipeline` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_m_avoid_harm_seler_pipeline_en_5.5.0_3.0_1726980013874.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_m_avoid_harm_seler_pipeline_en_5.5.0_3.0_1726980013874.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_m_avoid_harm_seler_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_m_avoid_harm_seler_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_m_avoid_harm_seler_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-m_avoid_harm_seler + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_class_weighted_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_class_weighted_pipeline_en.md new file mode 100644 index 00000000000000..af0622b47782b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_class_weighted_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_class_weighted_pipeline pipeline DistilBertForSequenceClassification from ben-yu +author: John Snow Labs +name: distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_class_weighted_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_class_weighted_pipeline` is a English model originally trained by ben-yu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_class_weighted_pipeline_en_5.5.0_3.0_1727020582269.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_class_weighted_pipeline_en_5.5.0_3.0_1727020582269.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_class_weighted_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_class_weighted_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_nlp_letters_s1_s2_degendered_class_weighted_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ben-yu/distilbert-base-uncased-finetuned-nlp-letters-s1-s2-degendered-class-weighted + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_t_feedback_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_t_feedback_en.md new file mode 100644 index 00000000000000..6de9583f112de6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_t_feedback_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_t_feedback DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_t_feedback +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_t_feedback` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_feedback_en_5.5.0_3.0_1727012913300.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_feedback_en_5.5.0_3.0_1727012913300.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_t_feedback","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_t_feedback", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_t_feedback| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-t_feedback \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_t_product_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_t_product_en.md new file mode 100644 index 00000000000000..ff3b0a250f6a8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_t_product_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_t_product DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_t_product +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_t_product` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_product_en_5.5.0_3.0_1727035146952.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_product_en_5.5.0_3.0_1727035146952.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_t_product","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_t_product", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_t_product| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-t_product \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_t_product_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_t_product_pipeline_en.md new file mode 100644 index 00000000000000..fa16bd17668083 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_t_product_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_t_product_pipeline pipeline DistilBertForSequenceClassification from Gregorig +author: John Snow Labs +name: distilbert_base_uncased_finetuned_t_product_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_t_product_pipeline` is a English model originally trained by Gregorig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_product_pipeline_en_5.5.0_3.0_1727035159928.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_t_product_pipeline_en_5.5.0_3.0_1727035159928.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_t_product_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_t_product_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_t_product_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gregorig/distilbert-base-uncased-finetuned-t_product + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_en.md new file mode 100644 index 00000000000000..7ec8f892e3be62 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_en_5.5.0_3.0_1727020693294.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_en_5.5.0_3.0_1727020693294.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_idm_zphr_0st52sd_ut52ut1_PLPrefix0stlarge_simsp100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_pipeline_en.md new file mode 100644 index 00000000000000..facbac303aac0e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_pipeline_en_5.5.0_3.0_1727020705248.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_pipeline_en_5.5.0_3.0_1727020705248.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_idm_zphr_0st52sd_ut52ut1_plprefix0stlarge_simsp100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_idm_zphr_0st52sd_ut52ut1_PLPrefix0stlarge_simsp100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_massive_v1_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_massive_v1_en.md new file mode 100644 index 00000000000000..b0d15129052aae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_massive_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_massive_v1 DistilBertForSequenceClassification from benayas +author: John Snow Labs +name: distilbert_base_uncased_massive_v1 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_massive_v1` is a English model originally trained by benayas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_massive_v1_en_5.5.0_3.0_1726980748700.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_massive_v1_en_5.5.0_3.0_1726980748700.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_massive_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_massive_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_massive_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|258.0 MB| + +## References + +https://huggingface.co/benayas/distilbert-base-uncased-massive-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en.md new file mode 100644 index 00000000000000..7fc6acf4afc86c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en_5.5.0_3.0_1727033455305.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en_5.5.0_3.0_1727033455305.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st13sd_ut72ut5_PLPrefix0stlarge_simsp100_clean100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_en.md new file mode 100644 index 00000000000000..7f177881234008 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_en_5.5.0_3.0_1727033389982.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_en_5.5.0_3.0_1727033389982.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st14sd_ut12ut1_PLPrefix0stlarge_simsp100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_pipeline_en.md new file mode 100644 index 00000000000000..e790cb9733e186 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_pipeline_en_5.5.0_3.0_1727033403391.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_pipeline_en_5.5.0_3.0_1727033403391.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st14sd_ut12ut1_plprefix0stlarge_simsp100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st14sd_ut12ut1_PLPrefix0stlarge_simsp100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en.md new file mode 100644 index 00000000000000..23624fe213ce90 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en_5.5.0_3.0_1727033153168.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en_5.5.0_3.0_1727033153168.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st1sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st1sd_ut72ut5_PLPrefix0stlarge_simsp100_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_en.md new file mode 100644 index 00000000000000..ca549859903a76 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_en_5.5.0_3.0_1727035624423.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_en_5.5.0_3.0_1727035624423.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st3sd_ut52ut1_PLPrefix0stlarge3_simsp100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_pipeline_en.md new file mode 100644 index 00000000000000..9cab9c8ab1d257 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_pipeline_en_5.5.0_3.0_1727035636666.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_pipeline_en_5.5.0_3.0_1727035636666.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st3sd_ut52ut1_plprefix0stlarge3_simsp100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st3sd_ut52ut1_PLPrefix0stlarge3_simsp100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge71_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge71_simsp_pipeline_en.md new file mode 100644 index 00000000000000..518427dcf295f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge71_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge71_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge71_simsp_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge71_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge71_simsp_pipeline_en_5.5.0_3.0_1727033312225.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge71_simsp_pipeline_en_5.5.0_3.0_1727033312225.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge71_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge71_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge71_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1_PLPrefix0stlarge71_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_en.md new file mode 100644 index 00000000000000..b1f490b4c7aa13 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_en_5.5.0_3.0_1726980326234.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_en_5.5.0_3.0_1726980326234.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st5sd_ut72ut1largePfxNf_simsp300_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en.md new file mode 100644 index 00000000000000..73546597f36051 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en_5.5.0_3.0_1726980484367.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en_5.5.0_3.0_1726980484367.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st5sd_ut72ut5_PLPrefix0stlarge_simsp100_clean300 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_sst2_v0_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_sst2_v0_en.md new file mode 100644 index 00000000000000..45b845ad8ad0d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_sst2_v0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_sst2_v0 DistilBertForSequenceClassification from benayas +author: John Snow Labs +name: distilbert_base_uncased_sst2_v0 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_sst2_v0` is a English model originally trained by benayas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_sst2_v0_en_5.5.0_3.0_1727012888751.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_sst2_v0_en_5.5.0_3.0_1727012888751.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_sst2_v0","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_sst2_v0", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_sst2_v0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|252.0 MB| + +## References + +https://huggingface.co/benayas/distilbert-base-uncased-sst2-v0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_sst2_v0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_sst2_v0_pipeline_en.md new file mode 100644 index 00000000000000..56e50b758ee246 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_sst2_v0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_sst2_v0_pipeline pipeline DistilBertForSequenceClassification from benayas +author: John Snow Labs +name: distilbert_base_uncased_sst2_v0_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_sst2_v0_pipeline` is a English model originally trained by benayas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_sst2_v0_pipeline_en_5.5.0_3.0_1727012901729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_sst2_v0_pipeline_en_5.5.0_3.0_1727012901729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_sst2_v0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_sst2_v0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_sst2_v0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|252.0 MB| + +## References + +https://huggingface.co/benayas/distilbert-base-uncased-sst2-v0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_en.md new file mode 100644 index 00000000000000..ed3495a13e17cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_en_5.5.0_3.0_1727021064570.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_en_5.5.0_3.0_1727021064570.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut72ut1_plain_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_earningspersharediluted_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_earningspersharediluted_en.md new file mode 100644 index 00000000000000..d2e6a6fa0dec26 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_earningspersharediluted_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_earningspersharediluted DistilBertForSequenceClassification from lenguyen +author: John Snow Labs +name: distilbert_earningspersharediluted +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_earningspersharediluted` is a English model originally trained by lenguyen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_earningspersharediluted_en_5.5.0_3.0_1727012565786.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_earningspersharediluted_en_5.5.0_3.0_1727012565786.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_earningspersharediluted","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_earningspersharediluted", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_earningspersharediluted| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|411.0 MB| + +## References + +https://huggingface.co/lenguyen/distilbert_EarningsPerShareDiluted \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_earningspersharediluted_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_earningspersharediluted_pipeline_en.md new file mode 100644 index 00000000000000..8ea6307803a66e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_earningspersharediluted_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_earningspersharediluted_pipeline pipeline DistilBertForSequenceClassification from lenguyen +author: John Snow Labs +name: distilbert_earningspersharediluted_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_earningspersharediluted_pipeline` is a English model originally trained by lenguyen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_earningspersharediluted_pipeline_en_5.5.0_3.0_1727012585255.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_earningspersharediluted_pipeline_en_5.5.0_3.0_1727012585255.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_earningspersharediluted_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_earningspersharediluted_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_earningspersharediluted_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.0 MB| + +## References + +https://huggingface.co/lenguyen/distilbert_EarningsPerShareDiluted + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_emotion_emindurmus80_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_emotion_emindurmus80_pipeline_en.md new file mode 100644 index 00000000000000..396f604c441a85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_emotion_emindurmus80_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_emotion_emindurmus80_pipeline pipeline DistilBertForSequenceClassification from EminDurmus80 +author: John Snow Labs +name: distilbert_emotion_emindurmus80_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_emindurmus80_pipeline` is a English model originally trained by EminDurmus80. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_emindurmus80_pipeline_en_5.5.0_3.0_1726980340867.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_emindurmus80_pipeline_en_5.5.0_3.0_1726980340867.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_emotion_emindurmus80_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_emotion_emindurmus80_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_emindurmus80_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/EminDurmus80/distilbert-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_emotion_ucuncubayram_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_emotion_ucuncubayram_en.md new file mode 100644 index 00000000000000..db8c4b38273899 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_emotion_ucuncubayram_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotion_ucuncubayram DistilBertForSequenceClassification from ucuncubayram +author: John Snow Labs +name: distilbert_emotion_ucuncubayram +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_ucuncubayram` is a English model originally trained by ucuncubayram. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_ucuncubayram_en_5.5.0_3.0_1727033132781.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_ucuncubayram_en_5.5.0_3.0_1727033132781.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_ucuncubayram","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_ucuncubayram", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_ucuncubayram| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ucuncubayram/distilbert-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_emotion_ucuncubayram_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_emotion_ucuncubayram_pipeline_en.md new file mode 100644 index 00000000000000..080422d3624832 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_emotion_ucuncubayram_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_emotion_ucuncubayram_pipeline pipeline DistilBertForSequenceClassification from ucuncubayram +author: John Snow Labs +name: distilbert_emotion_ucuncubayram_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_ucuncubayram_pipeline` is a English model originally trained by ucuncubayram. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_ucuncubayram_pipeline_en_5.5.0_3.0_1727033155933.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_ucuncubayram_pipeline_en_5.5.0_3.0_1727033155933.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_emotion_ucuncubayram_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_emotion_ucuncubayram_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_ucuncubayram_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ucuncubayram/distilbert-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_finetuned_imdb_sentiment_omalve_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_finetuned_imdb_sentiment_omalve_en.md new file mode 100644 index 00000000000000..37e6e758d8db76 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_finetuned_imdb_sentiment_omalve_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_finetuned_imdb_sentiment_omalve DistilBertForSequenceClassification from OmAlve +author: John Snow Labs +name: distilbert_finetuned_imdb_sentiment_omalve +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_imdb_sentiment_omalve` is a English model originally trained by OmAlve. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_sentiment_omalve_en_5.5.0_3.0_1727035180915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_sentiment_omalve_en_5.5.0_3.0_1727035180915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_finetuned_imdb_sentiment_omalve","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_finetuned_imdb_sentiment_omalve", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_imdb_sentiment_omalve| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/OmAlve/distilbert-finetuned-imdb-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_food_hightensan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_food_hightensan_pipeline_en.md new file mode 100644 index 00000000000000..ce5c946019d0d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_food_hightensan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_food_hightensan_pipeline pipeline DistilBertForSequenceClassification from hightensan +author: John Snow Labs +name: distilbert_food_hightensan_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_food_hightensan_pipeline` is a English model originally trained by hightensan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_food_hightensan_pipeline_en_5.5.0_3.0_1727020625387.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_food_hightensan_pipeline_en_5.5.0_3.0_1727020625387.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_food_hightensan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_food_hightensan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_food_hightensan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hightensan/distilbert-food + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_foundation_category_c5_finetune_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_foundation_category_c5_finetune_en.md new file mode 100644 index 00000000000000..414df34d8c791a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_foundation_category_c5_finetune_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_foundation_category_c5_finetune DistilBertForSequenceClassification from eric-mc2 +author: John Snow Labs +name: distilbert_foundation_category_c5_finetune +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_foundation_category_c5_finetune` is a English model originally trained by eric-mc2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_foundation_category_c5_finetune_en_5.5.0_3.0_1727033489081.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_foundation_category_c5_finetune_en_5.5.0_3.0_1727033489081.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_foundation_category_c5_finetune","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_foundation_category_c5_finetune", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_foundation_category_c5_finetune| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/eric-mc2/distilbert-foundation-category-c5-finetune \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_lora_false_adapted_augment_false_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_lora_false_adapted_augment_false_en.md new file mode 100644 index 00000000000000..1908f116c71620 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_lora_false_adapted_augment_false_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_lora_false_adapted_augment_false DistilBertForSequenceClassification from EmiMule +author: John Snow Labs +name: distilbert_lora_false_adapted_augment_false +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_lora_false_adapted_augment_false` is a English model originally trained by EmiMule. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_lora_false_adapted_augment_false_en_5.5.0_3.0_1727012552082.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_lora_false_adapted_augment_false_en_5.5.0_3.0_1727012552082.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_lora_false_adapted_augment_false","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_lora_false_adapted_augment_false", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_lora_false_adapted_augment_false| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/EmiMule/distilbert-LoRA-False-adapted-augment-False \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_lora_false_adapted_augment_false_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_lora_false_adapted_augment_false_pipeline_en.md new file mode 100644 index 00000000000000..2a208d91b4c831 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_lora_false_adapted_augment_false_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_lora_false_adapted_augment_false_pipeline pipeline DistilBertForSequenceClassification from EmiMule +author: John Snow Labs +name: distilbert_lora_false_adapted_augment_false_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_lora_false_adapted_augment_false_pipeline` is a English model originally trained by EmiMule. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_lora_false_adapted_augment_false_pipeline_en_5.5.0_3.0_1727012565402.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_lora_false_adapted_augment_false_pipeline_en_5.5.0_3.0_1727012565402.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_lora_false_adapted_augment_false_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_lora_false_adapted_augment_false_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_lora_false_adapted_augment_false_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/EmiMule/distilbert-LoRA-False-adapted-augment-False + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_pipeline_en.md new file mode 100644 index 00000000000000..a110826ba36b48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_pipeline_en_5.5.0_3.0_1727033794920.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_pipeline_en_5.5.0_3.0_1727033794920.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|52.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_data_aug_mnli_192 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_en.md new file mode 100644 index 00000000000000..9a26cf333a73e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_en_5.5.0_3.0_1726980508595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_en_5.5.0_3.0_1726980508595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|71.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_data_aug_mnli_256 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_pipeline_en.md new file mode 100644 index 00000000000000..896a332b21dad0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_pipeline_en_5.5.0_3.0_1727020582154.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_pipeline_en_5.5.0_3.0_1727020582154.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|71.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_qqp_256 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_en.md new file mode 100644 index 00000000000000..75b5ec20e7035e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_en_5.5.0_3.0_1727033691889.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_en_5.5.0_3.0_1727033691889.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|111.9 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_qqp_384 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_pipeline_en.md new file mode 100644 index 00000000000000..c6727fcef6edf4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_pipeline_en_5.5.0_3.0_1727033698301.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_pipeline_en_5.5.0_3.0_1727033698301.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_384_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|111.9 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_qqp_384 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_sst2_padding20model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sst2_padding20model_pipeline_en.md new file mode 100644 index 00000000000000..adaedf6039e97f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sst2_padding20model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sst2_padding20model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_sst2_padding20model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sst2_padding20model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sst2_padding20model_pipeline_en_5.5.0_3.0_1726980736718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sst2_padding20model_pipeline_en_5.5.0_3.0_1726980736718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sst2_padding20model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sst2_padding20model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sst2_padding20model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_sst2_padding20model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_twitterfin_padding60model_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_twitterfin_padding60model_en.md new file mode 100644 index 00000000000000..d0f8f6ddf2277a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_twitterfin_padding60model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_twitterfin_padding60model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_twitterfin_padding60model +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_twitterfin_padding60model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding60model_en_5.5.0_3.0_1727035498696.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding60model_en_5.5.0_3.0_1727035498696.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_twitterfin_padding60model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_twitterfin_padding60model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_twitterfin_padding60model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_twitterfin_padding60model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_twitterfin_padding60model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_twitterfin_padding60model_pipeline_en.md new file mode 100644 index 00000000000000..fdae303e0839ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_twitterfin_padding60model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_twitterfin_padding60model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_twitterfin_padding60model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_twitterfin_padding60model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding60model_pipeline_en_5.5.0_3.0_1727035511041.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding60model_pipeline_en_5.5.0_3.0_1727035511041.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_twitterfin_padding60model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_twitterfin_padding60model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_twitterfin_padding60model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_twitterfin_padding60model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_twitterfin_padding80model_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_twitterfin_padding80model_en.md new file mode 100644 index 00000000000000..fe49c45a43287c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_twitterfin_padding80model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_twitterfin_padding80model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_twitterfin_padding80model +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_twitterfin_padding80model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding80model_en_5.5.0_3.0_1727033948183.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding80model_en_5.5.0_3.0_1727033948183.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_twitterfin_padding80model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_twitterfin_padding80model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_twitterfin_padding80model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_twitterfin_padding80model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_finetuned_condition_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_finetuned_condition_classifier_en.md new file mode 100644 index 00000000000000..95db676c196916 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_finetuned_condition_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_finetuned_condition_classifier RoBertaForSequenceClassification from BanUrsus +author: John Snow Labs +name: distilroberta_base_finetuned_condition_classifier +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_condition_classifier` is a English model originally trained by BanUrsus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_condition_classifier_en_5.5.0_3.0_1727026411869.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_condition_classifier_en_5.5.0_3.0_1727026411869.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_base_finetuned_condition_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_base_finetuned_condition_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_condition_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|311.3 MB| + +## References + +https://huggingface.co/BanUrsus/distilroberta-base-finetuned-condition-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_sst2_distilled_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_sst2_distilled_pipeline_en.md new file mode 100644 index 00000000000000..282feba8a7a697 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_sst2_distilled_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_sst2_distilled_pipeline pipeline RoBertaForSequenceClassification from aal2015 +author: John Snow Labs +name: distilroberta_base_sst2_distilled_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_sst2_distilled_pipeline` is a English model originally trained by aal2015. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_sst2_distilled_pipeline_en_5.5.0_3.0_1726972146442.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_sst2_distilled_pipeline_en_5.5.0_3.0_1726972146442.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_sst2_distilled_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_sst2_distilled_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_sst2_distilled_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/aal2015/distilroberta-base-sst2-distilled + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilroberta_finetuned_financial_text_regression_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilroberta_finetuned_financial_text_regression_en.md new file mode 100644 index 00000000000000..97b4974bab91dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilroberta_finetuned_financial_text_regression_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_finetuned_financial_text_regression RoBertaForSequenceClassification from lwat64 +author: John Snow Labs +name: distilroberta_finetuned_financial_text_regression +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_finetuned_financial_text_regression` is a English model originally trained by lwat64. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_finetuned_financial_text_regression_en_5.5.0_3.0_1726972135237.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_finetuned_financial_text_regression_en_5.5.0_3.0_1726972135237.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_finetuned_financial_text_regression","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("distilroberta_finetuned_financial_text_regression", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_finetuned_financial_text_regression| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/lwat64/distilroberta-finetuned-financial-text-regression \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-drug_ner_cat_v1_pipeline_ca.md b/docs/_posts/ahmedlone127/2024-09-22-drug_ner_cat_v1_pipeline_ca.md new file mode 100644 index 00000000000000..f5d56a94f5d6b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-drug_ner_cat_v1_pipeline_ca.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Catalan, Valencian drug_ner_cat_v1_pipeline pipeline RoBertaForTokenClassification from BSC-NLP4BIA +author: John Snow Labs +name: drug_ner_cat_v1_pipeline +date: 2024-09-22 +tags: [ca, open_source, pipeline, onnx] +task: Named Entity Recognition +language: ca +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`drug_ner_cat_v1_pipeline` is a Catalan, Valencian model originally trained by BSC-NLP4BIA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/drug_ner_cat_v1_pipeline_ca_5.5.0_3.0_1727048508541.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/drug_ner_cat_v1_pipeline_ca_5.5.0_3.0_1727048508541.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("drug_ner_cat_v1_pipeline", lang = "ca") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("drug_ner_cat_v1_pipeline", lang = "ca") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|drug_ner_cat_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ca| +|Size:|436.0 MB| + +## References + +https://huggingface.co/BSC-NLP4BIA/drug-ner-cat-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-echocardiogram_latvian_dilation_reduced_nl.md b/docs/_posts/ahmedlone127/2024-09-22-echocardiogram_latvian_dilation_reduced_nl.md new file mode 100644 index 00000000000000..1fc0ffdcf22487 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-echocardiogram_latvian_dilation_reduced_nl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Dutch, Flemish echocardiogram_latvian_dilation_reduced RoBertaForSequenceClassification from UMCU +author: John Snow Labs +name: echocardiogram_latvian_dilation_reduced +date: 2024-09-22 +tags: [nl, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`echocardiogram_latvian_dilation_reduced` is a Dutch, Flemish model originally trained by UMCU. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/echocardiogram_latvian_dilation_reduced_nl_5.5.0_3.0_1727017594069.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/echocardiogram_latvian_dilation_reduced_nl_5.5.0_3.0_1727017594069.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("echocardiogram_latvian_dilation_reduced","nl") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("echocardiogram_latvian_dilation_reduced", "nl") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|echocardiogram_latvian_dilation_reduced| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|nl| +|Size:|472.0 MB| + +## References + +https://huggingface.co/UMCU/Echocardiogram_LV_dilation_reduced \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-echocardiogram_latvian_dilation_reduced_pipeline_nl.md b/docs/_posts/ahmedlone127/2024-09-22-echocardiogram_latvian_dilation_reduced_pipeline_nl.md new file mode 100644 index 00000000000000..46f297d18913e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-echocardiogram_latvian_dilation_reduced_pipeline_nl.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Dutch, Flemish echocardiogram_latvian_dilation_reduced_pipeline pipeline RoBertaForSequenceClassification from UMCU +author: John Snow Labs +name: echocardiogram_latvian_dilation_reduced_pipeline +date: 2024-09-22 +tags: [nl, open_source, pipeline, onnx] +task: Text Classification +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`echocardiogram_latvian_dilation_reduced_pipeline` is a Dutch, Flemish model originally trained by UMCU. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/echocardiogram_latvian_dilation_reduced_pipeline_nl_5.5.0_3.0_1727017617462.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/echocardiogram_latvian_dilation_reduced_pipeline_nl_5.5.0_3.0_1727017617462.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("echocardiogram_latvian_dilation_reduced_pipeline", lang = "nl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("echocardiogram_latvian_dilation_reduced_pipeline", lang = "nl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|echocardiogram_latvian_dilation_reduced_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|nl| +|Size:|472.0 MB| + +## References + +https://huggingface.co/UMCU/Echocardiogram_LV_dilation_reduced + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-eee_en.md b/docs/_posts/ahmedlone127/2024-09-22-eee_en.md new file mode 100644 index 00000000000000..3c53cf7e317f4a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-eee_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English eee RoBertaForSequenceClassification from weicap +author: John Snow Labs +name: eee +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`eee` is a English model originally trained by weicap. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/eee_en_5.5.0_3.0_1727017509815.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/eee_en_5.5.0_3.0_1727017509815.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("eee","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("eee", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|eee| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|435.9 MB| + +## References + +https://huggingface.co/weicap/eee \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-eee_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-eee_pipeline_en.md new file mode 100644 index 00000000000000..bf9169fe7193aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-eee_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English eee_pipeline pipeline RoBertaForSequenceClassification from weicap +author: John Snow Labs +name: eee_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`eee_pipeline` is a English model originally trained by weicap. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/eee_pipeline_en_5.5.0_3.0_1727017535371.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/eee_pipeline_en_5.5.0_3.0_1727017535371.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("eee_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("eee_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|eee_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|435.9 MB| + +## References + +https://huggingface.co/weicap/eee + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random0_seed0_bertweet_large_en.md b/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random0_seed0_bertweet_large_en.md new file mode 100644 index 00000000000000..c93613cb8d1c9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random0_seed0_bertweet_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emoji_emoji_random0_seed0_bertweet_large RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random0_seed0_bertweet_large +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random0_seed0_bertweet_large` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed0_bertweet_large_en_5.5.0_3.0_1727027210427.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed0_bertweet_large_en_5.5.0_3.0_1727027210427.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("emoji_emoji_random0_seed0_bertweet_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("emoji_emoji_random0_seed0_bertweet_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random0_seed0_bertweet_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random0_seed0-bertweet-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random0_seed0_bertweet_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random0_seed0_bertweet_large_pipeline_en.md new file mode 100644 index 00000000000000..4e101b515b9caa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random0_seed0_bertweet_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emoji_emoji_random0_seed0_bertweet_large_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random0_seed0_bertweet_large_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random0_seed0_bertweet_large_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed0_bertweet_large_pipeline_en_5.5.0_3.0_1727027282592.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random0_seed0_bertweet_large_pipeline_en_5.5.0_3.0_1727027282592.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emoji_emoji_random0_seed0_bertweet_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emoji_emoji_random0_seed0_bertweet_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random0_seed0_bertweet_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random0_seed0-bertweet-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_en.md b/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_en.md new file mode 100644 index 00000000000000..b77d76f6d73d2e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_en_5.5.0_3.0_1727027531103.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_en_5.5.0_3.0_1727027531103.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.6 MB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random3_seed0-twitter-roberta-base-2021-124m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_pipeline_en.md new file mode 100644 index 00000000000000..1069df8b07aaf0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_pipeline_en_5.5.0_3.0_1727027554029.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_pipeline_en_5.5.0_3.0_1727027554029.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random3_seed0_twitter_roberta_base_2021_124m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.6 MB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random3_seed0-twitter-roberta-base-2021-124m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_pipeline_en.md new file mode 100644 index 00000000000000..ad279cac9a48b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_pipeline_en_5.5.0_3.0_1727037385150.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_pipeline_en_5.5.0_3.0_1727037385150.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emoji_emoji_random3_seed1_twitter_roberta_base_2019_90m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.6 MB| + +## References + +https://huggingface.co/tweettemposhift/emoji-emoji_random3_seed1-twitter-roberta-base-2019-90m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-emotion_model_en.md b/docs/_posts/ahmedlone127/2024-09-22-emotion_model_en.md new file mode 100644 index 00000000000000..01561989e16bed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-emotion_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emotion_model DistilBertForSequenceClassification from naamalia23 +author: John Snow Labs +name: emotion_model +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotion_model` is a English model originally trained by naamalia23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotion_model_en_5.5.0_3.0_1727035394131.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotion_model_en_5.5.0_3.0_1727035394131.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("emotion_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("emotion_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotion_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/naamalia23/emotion_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-emotion_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-emotion_model_pipeline_en.md new file mode 100644 index 00000000000000..c9fb62915141dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-emotion_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emotion_model_pipeline pipeline DistilBertForSequenceClassification from naamalia23 +author: John Snow Labs +name: emotion_model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotion_model_pipeline` is a English model originally trained by naamalia23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotion_model_pipeline_en_5.5.0_3.0_1727035406491.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotion_model_pipeline_en_5.5.0_3.0_1727035406491.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emotion_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emotion_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotion_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/naamalia23/emotion_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-eq_bert_v1_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-eq_bert_v1_1_pipeline_en.md new file mode 100644 index 00000000000000..96e77a7e8ecddb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-eq_bert_v1_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English eq_bert_v1_1_pipeline pipeline BertEmbeddings from RyotaroOKabe +author: John Snow Labs +name: eq_bert_v1_1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`eq_bert_v1_1_pipeline` is a English model originally trained by RyotaroOKabe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/eq_bert_v1_1_pipeline_en_5.5.0_3.0_1726973584645.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/eq_bert_v1_1_pipeline_en_5.5.0_3.0_1726973584645.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("eq_bert_v1_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("eq_bert_v1_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|eq_bert_v1_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/RyotaroOKabe/eq_bert_v1.1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-films_hate_offensive_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-22-films_hate_offensive_roberta_en.md new file mode 100644 index 00000000000000..00dbbec0ee7b46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-films_hate_offensive_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English films_hate_offensive_roberta RoBertaForSequenceClassification from esmarquez17 +author: John Snow Labs +name: films_hate_offensive_roberta +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`films_hate_offensive_roberta` is a English model originally trained by esmarquez17. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/films_hate_offensive_roberta_en_5.5.0_3.0_1726972212863.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/films_hate_offensive_roberta_en_5.5.0_3.0_1726972212863.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("films_hate_offensive_roberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("films_hate_offensive_roberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|films_hate_offensive_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|448.5 MB| + +## References + +https://huggingface.co/esmarquez17/films-hate-offensive-roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_pipeline_en.md new file mode 100644 index 00000000000000..d98359a84519b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_pipeline pipeline RoBertaEmbeddings from manucos +author: John Snow Labs +name: final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_pipeline` is a English model originally trained by manucos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_pipeline_en_5.5.0_3.0_1726999522571.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_pipeline_en_5.5.0_3.0_1726999522571.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_ft__roberta_clinical_wl_spanish__70k_ultrasounds_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|469.7 MB| + +## References + +https://huggingface.co/manucos/final-ft__roberta-clinical-wl-es__70k-ultrasounds + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-final_model_mkbackup_en.md b/docs/_posts/ahmedlone127/2024-09-22-final_model_mkbackup_en.md new file mode 100644 index 00000000000000..146822dc15f2d4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-final_model_mkbackup_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English final_model_mkbackup WhisperForCTC from mkbackup +author: John Snow Labs +name: final_model_mkbackup +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_model_mkbackup` is a English model originally trained by mkbackup. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_model_mkbackup_en_5.5.0_3.0_1726985377035.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_model_mkbackup_en_5.5.0_3.0_1726985377035.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("final_model_mkbackup","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("final_model_mkbackup", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_model_mkbackup| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/mkbackup/final_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-final_whisper_for_initial_publish_v2_en.md b/docs/_posts/ahmedlone127/2024-09-22-final_whisper_for_initial_publish_v2_en.md new file mode 100644 index 00000000000000..98e5b916cda19c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-final_whisper_for_initial_publish_v2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English final_whisper_for_initial_publish_v2 WhisperForCTC from AsemBadr +author: John Snow Labs +name: final_whisper_for_initial_publish_v2 +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_whisper_for_initial_publish_v2` is a English model originally trained by AsemBadr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_whisper_for_initial_publish_v2_en_5.5.0_3.0_1727023247076.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_whisper_for_initial_publish_v2_en_5.5.0_3.0_1727023247076.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("final_whisper_for_initial_publish_v2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("final_whisper_for_initial_publish_v2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_whisper_for_initial_publish_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/AsemBadr/final-whisper-for-initial-publish-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-final_whisper_for_initial_publish_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-final_whisper_for_initial_publish_v2_pipeline_en.md new file mode 100644 index 00000000000000..64925be6693dc0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-final_whisper_for_initial_publish_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English final_whisper_for_initial_publish_v2_pipeline pipeline WhisperForCTC from AsemBadr +author: John Snow Labs +name: final_whisper_for_initial_publish_v2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_whisper_for_initial_publish_v2_pipeline` is a English model originally trained by AsemBadr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_whisper_for_initial_publish_v2_pipeline_en_5.5.0_3.0_1727023335601.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_whisper_for_initial_publish_v2_pipeline_en_5.5.0_3.0_1727023335601.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("final_whisper_for_initial_publish_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("final_whisper_for_initial_publish_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_whisper_for_initial_publish_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/AsemBadr/final-whisper-for-initial-publish-v2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuned_demo_2x_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuned_demo_2x_en.md new file mode 100644 index 00000000000000..94d4b728b6ddf4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuned_demo_2x_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_demo_2x DistilBertForSequenceClassification from nardellu +author: John Snow Labs +name: finetuned_demo_2x +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_demo_2x` is a English model originally trained by nardellu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_demo_2x_en_5.5.0_3.0_1727020657927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_demo_2x_en_5.5.0_3.0_1727020657927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_demo_2x","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_demo_2x", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_demo_2x| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/nardellu/finetuned_demo_2X \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuned_sentiment_model_imdb_distilbert_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuned_sentiment_model_imdb_distilbert_2_pipeline_en.md new file mode 100644 index 00000000000000..a6cd3deb1bd756 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuned_sentiment_model_imdb_distilbert_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_sentiment_model_imdb_distilbert_2_pipeline pipeline DistilBertForSequenceClassification from Tzimon +author: John Snow Labs +name: finetuned_sentiment_model_imdb_distilbert_2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_sentiment_model_imdb_distilbert_2_pipeline` is a English model originally trained by Tzimon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_sentiment_model_imdb_distilbert_2_pipeline_en_5.5.0_3.0_1727020981741.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_sentiment_model_imdb_distilbert_2_pipeline_en_5.5.0_3.0_1727020981741.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_sentiment_model_imdb_distilbert_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_sentiment_model_imdb_distilbert_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_sentiment_model_imdb_distilbert_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Tzimon/finetuned_sentiment_model_imdb_distilbert_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_11000_samples_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_11000_samples_pipeline_en.md new file mode 100644 index 00000000000000..48c873ec6f5106 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_11000_samples_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_11000_samples_pipeline pipeline DistilBertForSequenceClassification from ZainabNac +author: John Snow Labs +name: finetuning_sentiment_model_11000_samples_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_11000_samples_pipeline` is a English model originally trained by ZainabNac. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_11000_samples_pipeline_en_5.5.0_3.0_1726980534579.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_11000_samples_pipeline_en_5.5.0_3.0_1726980534579.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_11000_samples_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_11000_samples_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_11000_samples_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ZainabNac/finetuning-sentiment-model-11000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_ammarasmro_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_ammarasmro_en.md new file mode 100644 index 00000000000000..9e2ff3dc02fc97 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_ammarasmro_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ammarasmro DistilBertForSequenceClassification from ammarasmro +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ammarasmro +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ammarasmro` is a English model originally trained by ammarasmro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ammarasmro_en_5.5.0_3.0_1727012797675.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ammarasmro_en_5.5.0_3.0_1727012797675.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ammarasmro","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ammarasmro", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ammarasmro| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ammarasmro/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_bianchidev_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_bianchidev_en.md new file mode 100644 index 00000000000000..00dc82ba2110d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_bianchidev_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_bianchidev DistilBertForSequenceClassification from BianchiDev +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_bianchidev +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_bianchidev` is a English model originally trained by BianchiDev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_bianchidev_en_5.5.0_3.0_1727020687554.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_bianchidev_en_5.5.0_3.0_1727020687554.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_bianchidev","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_bianchidev", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_bianchidev| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BianchiDev/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_jimbo4794_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_jimbo4794_en.md new file mode 100644 index 00000000000000..61eb30a2f1a2e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_jimbo4794_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_jimbo4794 DistilBertForSequenceClassification from Jimbo4794 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_jimbo4794 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_jimbo4794` is a English model originally trained by Jimbo4794. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_jimbo4794_en_5.5.0_3.0_1727021134459.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_jimbo4794_en_5.5.0_3.0_1727021134459.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_jimbo4794","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_jimbo4794", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_jimbo4794| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Jimbo4794/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_liujiajiaee_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_liujiajiaee_pipeline_en.md new file mode 100644 index 00000000000000..9f5e66bc79d42a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_liujiajiaee_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_liujiajiaee_pipeline pipeline DistilBertForSequenceClassification from liujiajiaee +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_liujiajiaee_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_liujiajiaee_pipeline` is a English model originally trained by liujiajiaee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_liujiajiaee_pipeline_en_5.5.0_3.0_1726980307608.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_liujiajiaee_pipeline_en_5.5.0_3.0_1726980307608.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_liujiajiaee_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_liujiajiaee_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_liujiajiaee_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/liujiajiaee/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_5000_samples_yvillamil_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_5000_samples_yvillamil_en.md new file mode 100644 index 00000000000000..3a0fddf865e761 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_5000_samples_yvillamil_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_5000_samples_yvillamil DistilBertForSequenceClassification from yvillamil +author: John Snow Labs +name: finetuning_sentiment_model_5000_samples_yvillamil +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_5000_samples_yvillamil` is a English model originally trained by yvillamil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_samples_yvillamil_en_5.5.0_3.0_1727035271767.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_samples_yvillamil_en_5.5.0_3.0_1727035271767.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_5000_samples_yvillamil","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_5000_samples_yvillamil", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_5000_samples_yvillamil| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yvillamil/finetuning-sentiment-model-5000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_5000_samples_yvillamil_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_5000_samples_yvillamil_pipeline_en.md new file mode 100644 index 00000000000000..804318d441d5d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_5000_samples_yvillamil_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_5000_samples_yvillamil_pipeline pipeline DistilBertForSequenceClassification from yvillamil +author: John Snow Labs +name: finetuning_sentiment_model_5000_samples_yvillamil_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_5000_samples_yvillamil_pipeline` is a English model originally trained by yvillamil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_samples_yvillamil_pipeline_en_5.5.0_3.0_1727035284169.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_samples_yvillamil_pipeline_en_5.5.0_3.0_1727035284169.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_5000_samples_yvillamil_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_5000_samples_yvillamil_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_5000_samples_yvillamil_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yvillamil/finetuning-sentiment-model-5000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_nerproject7_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_nerproject7_pipeline_en.md new file mode 100644 index 00000000000000..f3060157850681 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_nerproject7_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_nerproject7_pipeline pipeline DistilBertForSequenceClassification from nerproject7 +author: John Snow Labs +name: finetuning_sentiment_model_nerproject7_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_nerproject7_pipeline` is a English model originally trained by nerproject7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_nerproject7_pipeline_en_5.5.0_3.0_1726980216523.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_nerproject7_pipeline_en_5.5.0_3.0_1726980216523.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_nerproject7_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_nerproject7_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_nerproject7_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/nerproject7/finetuning-sentiment-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_qiuxuan_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_qiuxuan_en.md new file mode 100644 index 00000000000000..868b6638e5cbb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_qiuxuan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_qiuxuan DistilBertForSequenceClassification from Qiuxuan +author: John Snow Labs +name: finetuning_sentiment_model_qiuxuan +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_qiuxuan` is a English model originally trained by Qiuxuan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_qiuxuan_en_5.5.0_3.0_1726980555617.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_qiuxuan_en_5.5.0_3.0_1726980555617.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_qiuxuan","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_qiuxuan", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_qiuxuan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Qiuxuan/finetuning-sentiment-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-frugalscore_medium_deberta_bert_score_en.md b/docs/_posts/ahmedlone127/2024-09-22-frugalscore_medium_deberta_bert_score_en.md new file mode 100644 index 00000000000000..8142016044be9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-frugalscore_medium_deberta_bert_score_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English frugalscore_medium_deberta_bert_score BertForSequenceClassification from moussaKam +author: John Snow Labs +name: frugalscore_medium_deberta_bert_score +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frugalscore_medium_deberta_bert_score` is a English model originally trained by moussaKam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frugalscore_medium_deberta_bert_score_en_5.5.0_3.0_1727031985606.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frugalscore_medium_deberta_bert_score_en_5.5.0_3.0_1727031985606.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("frugalscore_medium_deberta_bert_score","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("frugalscore_medium_deberta_bert_score", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frugalscore_medium_deberta_bert_score| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|155.2 MB| + +## References + +https://huggingface.co/moussaKam/frugalscore_medium_deberta_bert-score \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-frugalscore_medium_roberta_bert_score_en.md b/docs/_posts/ahmedlone127/2024-09-22-frugalscore_medium_roberta_bert_score_en.md new file mode 100644 index 00000000000000..93486b94dd309b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-frugalscore_medium_roberta_bert_score_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English frugalscore_medium_roberta_bert_score BertForSequenceClassification from moussaKam +author: John Snow Labs +name: frugalscore_medium_roberta_bert_score +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frugalscore_medium_roberta_bert_score` is a English model originally trained by moussaKam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frugalscore_medium_roberta_bert_score_en_5.5.0_3.0_1727034480985.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frugalscore_medium_roberta_bert_score_en_5.5.0_3.0_1727034480985.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("frugalscore_medium_roberta_bert_score","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("frugalscore_medium_roberta_bert_score", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frugalscore_medium_roberta_bert_score| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|155.2 MB| + +## References + +https://huggingface.co/moussaKam/frugalscore_medium_roberta_bert-score \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-fyp_en.md b/docs/_posts/ahmedlone127/2024-09-22-fyp_en.md new file mode 100644 index 00000000000000..1cf759c53f520b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-fyp_en.md @@ -0,0 +1,88 @@ +--- +layout: model +title: English fyp T5Transformer from yaashwardhan +author: John Snow Labs +name: fyp +date: 2024-09-22 +tags: [en, open_source, onnx, t5, question_answering, summarization, translation, text_generation] +task: [Question Answering, Summarization, Translation, Text Generation] +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained T5Transformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fyp` is a English model originally trained by yaashwardhan. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fyp_en_5.5.0_3.0_1727034850196.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fyp_en_5.5.0_3.0_1727034850196.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +t5 = T5Transformer.pretrained("fyp","en") \ + .setInputCols(["document"]) \ + .setOutputCol("output") + +pipeline = Pipeline().setStages([documentAssembler, t5]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val t5 = T5Transformer.pretrained("fyp", "en") + .setInputCols(Array("documents")) + .setOutputCol("output") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, t5)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fyp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +References + +https://huggingface.co/yaashwardhan/fyp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-gs3n_roberta_model_es.md b/docs/_posts/ahmedlone127/2024-09-22-gs3n_roberta_model_es.md new file mode 100644 index 00000000000000..38ea5ae5107bfd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-gs3n_roberta_model_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish gs3n_roberta_model RoBertaForSequenceClassification from erickdp +author: John Snow Labs +name: gs3n_roberta_model +date: 2024-09-22 +tags: [es, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gs3n_roberta_model` is a Castilian, Spanish model originally trained by erickdp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gs3n_roberta_model_es_5.5.0_3.0_1727017134506.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gs3n_roberta_model_es_5.5.0_3.0_1727017134506.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("gs3n_roberta_model","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("gs3n_roberta_model", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gs3n_roberta_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|es| +|Size:|408.3 MB| + +## References + +https://huggingface.co/erickdp/gs3n-roberta-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-gs3n_roberta_model_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-22-gs3n_roberta_model_pipeline_es.md new file mode 100644 index 00000000000000..5ebd95fd36b51f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-gs3n_roberta_model_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish gs3n_roberta_model_pipeline pipeline RoBertaForSequenceClassification from erickdp +author: John Snow Labs +name: gs3n_roberta_model_pipeline +date: 2024-09-22 +tags: [es, open_source, pipeline, onnx] +task: Text Classification +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gs3n_roberta_model_pipeline` is a Castilian, Spanish model originally trained by erickdp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gs3n_roberta_model_pipeline_es_5.5.0_3.0_1727017153221.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gs3n_roberta_model_pipeline_es_5.5.0_3.0_1727017153221.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gs3n_roberta_model_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gs3n_roberta_model_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gs3n_roberta_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|408.3 MB| + +## References + +https://huggingface.co/erickdp/gs3n-roberta-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m_en.md b/docs/_posts/ahmedlone127/2024-09-22-hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m_en.md new file mode 100644 index 00000000000000..9f2e5ab7550120 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1726972267089.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1726972267089.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random0_seed1_twitter_roberta_base_2022_154m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random0_seed1-twitter-roberta-base-2022-154m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hate_hate_balance_random3_seed1_bertweet_large_en.md b/docs/_posts/ahmedlone127/2024-09-22-hate_hate_balance_random3_seed1_bertweet_large_en.md new file mode 100644 index 00000000000000..510d88f606a683 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hate_hate_balance_random3_seed1_bertweet_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_balance_random3_seed1_bertweet_large RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random3_seed1_bertweet_large +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random3_seed1_bertweet_large` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random3_seed1_bertweet_large_en_5.5.0_3.0_1727027488301.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random3_seed1_bertweet_large_en_5.5.0_3.0_1727027488301.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_balance_random3_seed1_bertweet_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_balance_random3_seed1_bertweet_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random3_seed1_bertweet_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random3_seed1-bertweet-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hate_hate_balance_random3_seed1_bertweet_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-hate_hate_balance_random3_seed1_bertweet_large_pipeline_en.md new file mode 100644 index 00000000000000..cce7329fcb97ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hate_hate_balance_random3_seed1_bertweet_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_balance_random3_seed1_bertweet_large_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random3_seed1_bertweet_large_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random3_seed1_bertweet_large_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random3_seed1_bertweet_large_pipeline_en_5.5.0_3.0_1727027584939.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random3_seed1_bertweet_large_pipeline_en_5.5.0_3.0_1727027584939.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_balance_random3_seed1_bertweet_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_balance_random3_seed1_bertweet_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random3_seed1_bertweet_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random3_seed1-bertweet-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hate_hate_random2_seed2_twitter_roberta_large_2022_154m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-hate_hate_random2_seed2_twitter_roberta_large_2022_154m_pipeline_en.md new file mode 100644 index 00000000000000..3cd79ed0f0bfa5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hate_hate_random2_seed2_twitter_roberta_large_2022_154m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_random2_seed2_twitter_roberta_large_2022_154m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_random2_seed2_twitter_roberta_large_2022_154m_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_random2_seed2_twitter_roberta_large_2022_154m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_random2_seed2_twitter_roberta_large_2022_154m_pipeline_en_5.5.0_3.0_1727027483050.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_random2_seed2_twitter_roberta_large_2022_154m_pipeline_en_5.5.0_3.0_1727027483050.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_random2_seed2_twitter_roberta_large_2022_154m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_random2_seed2_twitter_roberta_large_2022_154m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_random2_seed2_twitter_roberta_large_2022_154m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_random2_seed2-twitter-roberta-large-2022-154m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hello_classification_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-hello_classification_model_pipeline_en.md new file mode 100644 index 00000000000000..9351f8c1cce6ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hello_classification_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hello_classification_model_pipeline pipeline DistilBertForSequenceClassification from krishnareddy +author: John Snow Labs +name: hello_classification_model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hello_classification_model_pipeline` is a English model originally trained by krishnareddy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hello_classification_model_pipeline_en_5.5.0_3.0_1727012467443.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hello_classification_model_pipeline_en_5.5.0_3.0_1727012467443.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hello_classification_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hello_classification_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hello_classification_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/krishnareddy/hello_classification_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hin_trac1_fin_en.md b/docs/_posts/ahmedlone127/2024-09-22-hin_trac1_fin_en.md new file mode 100644 index 00000000000000..5f4c081f304a21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hin_trac1_fin_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hin_trac1_fin BertForSequenceClassification from Maha +author: John Snow Labs +name: hin_trac1_fin +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hin_trac1_fin` is a English model originally trained by Maha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hin_trac1_fin_en_5.5.0_3.0_1726988626300.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hin_trac1_fin_en_5.5.0_3.0_1726988626300.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("hin_trac1_fin","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("hin_trac1_fin", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hin_trac1_fin| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|667.3 MB| + +## References + +https://huggingface.co/Maha/hin-trac1_fin \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hindi_codemixed_abusive_muril_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-hindi_codemixed_abusive_muril_pipeline_en.md new file mode 100644 index 00000000000000..a1ce0ef69e3e18 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hindi_codemixed_abusive_muril_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hindi_codemixed_abusive_muril_pipeline pipeline BertForSequenceClassification from Hate-speech-CNERG +author: John Snow Labs +name: hindi_codemixed_abusive_muril_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hindi_codemixed_abusive_muril_pipeline` is a English model originally trained by Hate-speech-CNERG. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hindi_codemixed_abusive_muril_pipeline_en_5.5.0_3.0_1727032243043.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hindi_codemixed_abusive_muril_pipeline_en_5.5.0_3.0_1727032243043.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hindi_codemixed_abusive_muril_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hindi_codemixed_abusive_muril_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hindi_codemixed_abusive_muril_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|892.7 MB| + +## References + +https://huggingface.co/Hate-speech-CNERG/hindi-codemixed-abusive-MuRIL + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-imbalanced_model_title_9_en.md b/docs/_posts/ahmedlone127/2024-09-22-imbalanced_model_title_9_en.md new file mode 100644 index 00000000000000..0f6204b7890de3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-imbalanced_model_title_9_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English imbalanced_model_title_9 RoBertaForSequenceClassification from amishshah +author: John Snow Labs +name: imbalanced_model_title_9 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imbalanced_model_title_9` is a English model originally trained by amishshah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imbalanced_model_title_9_en_5.5.0_3.0_1727037143868.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imbalanced_model_title_9_en_5.5.0_3.0_1727037143868.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("imbalanced_model_title_9","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("imbalanced_model_title_9", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imbalanced_model_title_9| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|465.8 MB| + +## References + +https://huggingface.co/amishshah/imbalanced_model_title_9 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-imbalanced_model_title_9_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-imbalanced_model_title_9_pipeline_en.md new file mode 100644 index 00000000000000..42ae3b936e4cc7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-imbalanced_model_title_9_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English imbalanced_model_title_9_pipeline pipeline RoBertaForSequenceClassification from amishshah +author: John Snow Labs +name: imbalanced_model_title_9_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imbalanced_model_title_9_pipeline` is a English model originally trained by amishshah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imbalanced_model_title_9_pipeline_en_5.5.0_3.0_1727037169832.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imbalanced_model_title_9_pipeline_en_5.5.0_3.0_1727037169832.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("imbalanced_model_title_9_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("imbalanced_model_title_9_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imbalanced_model_title_9_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.8 MB| + +## References + +https://huggingface.co/amishshah/imbalanced_model_title_9 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-imdb4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-imdb4_pipeline_en.md new file mode 100644 index 00000000000000..f974fb4a112239 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-imdb4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English imdb4_pipeline pipeline BertForSequenceClassification from Lumos +author: John Snow Labs +name: imdb4_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdb4_pipeline` is a English model originally trained by Lumos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdb4_pipeline_en_5.5.0_3.0_1727032523653.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdb4_pipeline_en_5.5.0_3.0_1727032523653.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("imdb4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("imdb4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdb4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Lumos/imdb4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-imdb_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-imdb_5_pipeline_en.md new file mode 100644 index 00000000000000..c60a5ce86066b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-imdb_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English imdb_5_pipeline pipeline DistilBertForSequenceClassification from draghicivlad +author: John Snow Labs +name: imdb_5_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdb_5_pipeline` is a English model originally trained by draghicivlad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdb_5_pipeline_en_5.5.0_3.0_1726980746279.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdb_5_pipeline_en_5.5.0_3.0_1726980746279.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("imdb_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("imdb_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdb_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/draghicivlad/imdb_5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-javanese_bert_small_imdb_classifier_jv.md b/docs/_posts/ahmedlone127/2024-09-22-javanese_bert_small_imdb_classifier_jv.md new file mode 100644 index 00000000000000..458c4d3895d5eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-javanese_bert_small_imdb_classifier_jv.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Javanese javanese_bert_small_imdb_classifier BertForSequenceClassification from w11wo +author: John Snow Labs +name: javanese_bert_small_imdb_classifier +date: 2024-09-22 +tags: [jv, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: jv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`javanese_bert_small_imdb_classifier` is a Javanese model originally trained by w11wo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/javanese_bert_small_imdb_classifier_jv_5.5.0_3.0_1727032157896.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/javanese_bert_small_imdb_classifier_jv_5.5.0_3.0_1727032157896.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("javanese_bert_small_imdb_classifier","jv") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("javanese_bert_small_imdb_classifier", "jv") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|javanese_bert_small_imdb_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|jv| +|Size:|409.5 MB| + +## References + +https://huggingface.co/w11wo/javanese-bert-small-imdb-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-jerteh355sentpos4_en.md b/docs/_posts/ahmedlone127/2024-09-22-jerteh355sentpos4_en.md new file mode 100644 index 00000000000000..228a40a27d0628 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-jerteh355sentpos4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English jerteh355sentpos4 RoBertaForSequenceClassification from Tanor +author: John Snow Labs +name: jerteh355sentpos4 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jerteh355sentpos4` is a English model originally trained by Tanor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jerteh355sentpos4_en_5.5.0_3.0_1727026659452.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jerteh355sentpos4_en_5.5.0_3.0_1727026659452.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("jerteh355sentpos4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("jerteh355sentpos4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jerteh355sentpos4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Tanor/Jerteh355SENTPOS4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-jerteh355sentpos4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-jerteh355sentpos4_pipeline_en.md new file mode 100644 index 00000000000000..304a1f44aedb1f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-jerteh355sentpos4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English jerteh355sentpos4_pipeline pipeline RoBertaForSequenceClassification from Tanor +author: John Snow Labs +name: jerteh355sentpos4_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jerteh355sentpos4_pipeline` is a English model originally trained by Tanor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jerteh355sentpos4_pipeline_en_5.5.0_3.0_1727026730786.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jerteh355sentpos4_pipeline_en_5.5.0_3.0_1727026730786.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("jerteh355sentpos4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("jerteh355sentpos4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jerteh355sentpos4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Tanor/Jerteh355SENTPOS4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-jerteh355sentpos6_en.md b/docs/_posts/ahmedlone127/2024-09-22-jerteh355sentpos6_en.md new file mode 100644 index 00000000000000..a0ca5372e1b2a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-jerteh355sentpos6_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English jerteh355sentpos6 RoBertaForSequenceClassification from Tanor +author: John Snow Labs +name: jerteh355sentpos6 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jerteh355sentpos6` is a English model originally trained by Tanor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jerteh355sentpos6_en_5.5.0_3.0_1727017179028.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jerteh355sentpos6_en_5.5.0_3.0_1727017179028.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("jerteh355sentpos6","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("jerteh355sentpos6", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jerteh355sentpos6| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Tanor/Jerteh355SENTPOS6 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-jerteh355sentpos6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-jerteh355sentpos6_pipeline_en.md new file mode 100644 index 00000000000000..2846894566d200 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-jerteh355sentpos6_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English jerteh355sentpos6_pipeline pipeline RoBertaForSequenceClassification from Tanor +author: John Snow Labs +name: jerteh355sentpos6_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jerteh355sentpos6_pipeline` is a English model originally trained by Tanor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jerteh355sentpos6_pipeline_en_5.5.0_3.0_1727017241673.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jerteh355sentpos6_pipeline_en_5.5.0_3.0_1727017241673.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("jerteh355sentpos6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("jerteh355sentpos6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jerteh355sentpos6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Tanor/Jerteh355SENTPOS6 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-jobclassifier_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-jobclassifier_v2_pipeline_en.md new file mode 100644 index 00000000000000..a144b16e361d36 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-jobclassifier_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English jobclassifier_v2_pipeline pipeline BertForSequenceClassification from CleveGreen +author: John Snow Labs +name: jobclassifier_v2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jobclassifier_v2_pipeline` is a English model originally trained by CleveGreen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jobclassifier_v2_pipeline_en_5.5.0_3.0_1727030617677.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jobclassifier_v2_pipeline_en_5.5.0_3.0_1727030617677.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("jobclassifier_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("jobclassifier_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jobclassifier_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.3 MB| + +## References + +https://huggingface.co/CleveGreen/JobClassifier_v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-kgi_en.md b/docs/_posts/ahmedlone127/2024-09-22-kgi_en.md new file mode 100644 index 00000000000000..9120c1073a16ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-kgi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English kgi DistilBertForSequenceClassification from shrikant11 +author: John Snow Labs +name: kgi +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kgi` is a English model originally trained by shrikant11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kgi_en_5.5.0_3.0_1727033256079.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kgi_en_5.5.0_3.0_1727033256079.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("kgi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("kgi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kgi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/shrikant11/KGI \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-kitchen_applinces_bert_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-kitchen_applinces_bert_classifier_pipeline_en.md new file mode 100644 index 00000000000000..bfee439b1edf3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-kitchen_applinces_bert_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English kitchen_applinces_bert_classifier_pipeline pipeline DistilBertForSequenceClassification from decepticonsIsAllYouNeed +author: John Snow Labs +name: kitchen_applinces_bert_classifier_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kitchen_applinces_bert_classifier_pipeline` is a English model originally trained by decepticonsIsAllYouNeed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kitchen_applinces_bert_classifier_pipeline_en_5.5.0_3.0_1727033594640.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kitchen_applinces_bert_classifier_pipeline_en_5.5.0_3.0_1727033594640.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("kitchen_applinces_bert_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("kitchen_applinces_bert_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kitchen_applinces_bert_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.8 MB| + +## References + +https://huggingface.co/decepticonsIsAllYouNeed/kitchen_applinces_bert_classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-korscm_mbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-korscm_mbert_pipeline_en.md new file mode 100644 index 00000000000000..fcee1eff756252 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-korscm_mbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English korscm_mbert_pipeline pipeline BertForSequenceClassification from DeadBeast +author: John Snow Labs +name: korscm_mbert_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`korscm_mbert_pipeline` is a English model originally trained by DeadBeast. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/korscm_mbert_pipeline_en_5.5.0_3.0_1726988673586.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/korscm_mbert_pipeline_en_5.5.0_3.0_1726988673586.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("korscm_mbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("korscm_mbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|korscm_mbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|667.3 MB| + +## References + +https://huggingface.co/DeadBeast/korscm-mBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-lab2_id2223_pipeline_sv.md b/docs/_posts/ahmedlone127/2024-09-22-lab2_id2223_pipeline_sv.md new file mode 100644 index 00000000000000..4f37dd8f3f9f81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-lab2_id2223_pipeline_sv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Swedish lab2_id2223_pipeline pipeline WhisperForCTC from humeur +author: John Snow Labs +name: lab2_id2223_pipeline +date: 2024-09-22 +tags: [sv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab2_id2223_pipeline` is a Swedish model originally trained by humeur. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab2_id2223_pipeline_sv_5.5.0_3.0_1727025215707.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab2_id2223_pipeline_sv_5.5.0_3.0_1727025215707.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lab2_id2223_pipeline", lang = "sv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lab2_id2223_pipeline", lang = "sv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab2_id2223_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/humeur/lab2_id2223 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-lab2_id2223_sv.md b/docs/_posts/ahmedlone127/2024-09-22-lab2_id2223_sv.md new file mode 100644 index 00000000000000..7683614acbe2d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-lab2_id2223_sv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Swedish lab2_id2223 WhisperForCTC from humeur +author: John Snow Labs +name: lab2_id2223 +date: 2024-09-22 +tags: [sv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab2_id2223` is a Swedish model originally trained by humeur. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab2_id2223_sv_5.5.0_3.0_1727025133638.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab2_id2223_sv_5.5.0_3.0_1727025133638.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("lab2_id2223","sv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("lab2_id2223", "sv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab2_id2223| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|sv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/humeur/lab2_id2223 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-legal_base_v1_5__checkpoint2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-legal_base_v1_5__checkpoint2_pipeline_en.md new file mode 100644 index 00000000000000..b529fe6146aa1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-legal_base_v1_5__checkpoint2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English legal_base_v1_5__checkpoint2_pipeline pipeline RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: legal_base_v1_5__checkpoint2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_base_v1_5__checkpoint2_pipeline` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_base_v1_5__checkpoint2_pipeline_en_5.5.0_3.0_1727042054519.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_base_v1_5__checkpoint2_pipeline_en_5.5.0_3.0_1727042054519.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("legal_base_v1_5__checkpoint2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("legal_base_v1_5__checkpoint2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_base_v1_5__checkpoint2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|296.5 MB| + +## References + +https://huggingface.co/eduagarcia-temp/legal_base_v1_5__checkpoint2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-legal_bert_squad_law_en.md b/docs/_posts/ahmedlone127/2024-09-22-legal_bert_squad_law_en.md new file mode 100644 index 00000000000000..6c3d426f5b86b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-legal_bert_squad_law_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English legal_bert_squad_law BertForQuestionAnswering from lisa +author: John Snow Labs +name: legal_bert_squad_law +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_bert_squad_law` is a English model originally trained by lisa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_bert_squad_law_en_5.5.0_3.0_1726978911133.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_bert_squad_law_en_5.5.0_3.0_1726978911133.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("legal_bert_squad_law","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("legal_bert_squad_law", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_bert_squad_law| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.3 MB| + +## References + +https://huggingface.co/lisa/legal-bert-squad-law \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-legal_bert_squad_law_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-legal_bert_squad_law_pipeline_en.md new file mode 100644 index 00000000000000..05a6fcf1ea4791 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-legal_bert_squad_law_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English legal_bert_squad_law_pipeline pipeline BertForQuestionAnswering from lisa +author: John Snow Labs +name: legal_bert_squad_law_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_bert_squad_law_pipeline` is a English model originally trained by lisa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_bert_squad_law_pipeline_en_5.5.0_3.0_1726978929448.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_bert_squad_law_pipeline_en_5.5.0_3.0_1726978929448.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("legal_bert_squad_law_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("legal_bert_squad_law_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_bert_squad_law_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.3 MB| + +## References + +https://huggingface.co/lisa/legal-bert-squad-law + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-lenate_model_4_en.md b/docs/_posts/ahmedlone127/2024-09-22-lenate_model_4_en.md new file mode 100644 index 00000000000000..dfedddb16db12f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-lenate_model_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English lenate_model_4 DistilBertForSequenceClassification from lenate +author: John Snow Labs +name: lenate_model_4 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lenate_model_4` is a English model originally trained by lenate. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lenate_model_4_en_5.5.0_3.0_1726980427424.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lenate_model_4_en_5.5.0_3.0_1726980427424.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("lenate_model_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("lenate_model_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lenate_model_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lenate/lenate_model_4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-lenate_model_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-lenate_model_4_pipeline_en.md new file mode 100644 index 00000000000000..869ba74fb58496 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-lenate_model_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lenate_model_4_pipeline pipeline DistilBertForSequenceClassification from lenate +author: John Snow Labs +name: lenate_model_4_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lenate_model_4_pipeline` is a English model originally trained by lenate. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lenate_model_4_pipeline_en_5.5.0_3.0_1726980439372.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lenate_model_4_pipeline_en_5.5.0_3.0_1726980439372.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lenate_model_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lenate_model_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lenate_model_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lenate/lenate_model_4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-leth_en.md b/docs/_posts/ahmedlone127/2024-09-22-leth_en.md new file mode 100644 index 00000000000000..0882fcfabf997e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-leth_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English leth RoBertaForSequenceClassification from Arlethh +author: John Snow Labs +name: leth +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`leth` is a English model originally trained by Arlethh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/leth_en_5.5.0_3.0_1727016715405.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/leth_en_5.5.0_3.0_1727016715405.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("leth","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("leth", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|leth| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/Arlethh/leth \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-leth_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-leth_pipeline_en.md new file mode 100644 index 00000000000000..9f1273877fe15a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-leth_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English leth_pipeline pipeline RoBertaForSequenceClassification from Arlethh +author: John Snow Labs +name: leth_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`leth_pipeline` is a English model originally trained by Arlethh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/leth_pipeline_en_5.5.0_3.0_1727016730115.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/leth_pipeline_en_5.5.0_3.0_1727016730115.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("leth_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("leth_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|leth_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/Arlethh/leth + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-llm_practice001_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-llm_practice001_pipeline_en.md new file mode 100644 index 00000000000000..1930a245aa048a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-llm_practice001_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English llm_practice001_pipeline pipeline DistilBertForSequenceClassification from JiAYu1997 +author: John Snow Labs +name: llm_practice001_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llm_practice001_pipeline` is a English model originally trained by JiAYu1997. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llm_practice001_pipeline_en_5.5.0_3.0_1727020517685.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llm_practice001_pipeline_en_5.5.0_3.0_1727020517685.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("llm_practice001_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("llm_practice001_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llm_practice001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/JiAYu1997/LLM_Practice001 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-llm_project_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-llm_project_pipeline_en.md new file mode 100644 index 00000000000000..21e703669b338e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-llm_project_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English llm_project_pipeline pipeline DistilBertForSequenceClassification from ThuyTran102 +author: John Snow Labs +name: llm_project_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llm_project_pipeline` is a English model originally trained by ThuyTran102. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llm_project_pipeline_en_5.5.0_3.0_1727020571975.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llm_project_pipeline_en_5.5.0_3.0_1727020571975.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("llm_project_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("llm_project_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llm_project_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ThuyTran102/LLM_project + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-llmhw01_chtsai2104_en.md b/docs/_posts/ahmedlone127/2024-09-22-llmhw01_chtsai2104_en.md new file mode 100644 index 00000000000000..307642b8292f20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-llmhw01_chtsai2104_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English llmhw01_chtsai2104 DistilBertForSequenceClassification from chtsai2104 +author: John Snow Labs +name: llmhw01_chtsai2104 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llmhw01_chtsai2104` is a English model originally trained by chtsai2104. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llmhw01_chtsai2104_en_5.5.0_3.0_1726980108123.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llmhw01_chtsai2104_en_5.5.0_3.0_1726980108123.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("llmhw01_chtsai2104","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("llmhw01_chtsai2104", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llmhw01_chtsai2104| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chtsai2104/llmhw01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-malicious_prompt_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-22-malicious_prompt_classifier_en.md new file mode 100644 index 00000000000000..fb87ec854f42ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-malicious_prompt_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English malicious_prompt_classifier DistilBertForSequenceClassification from Al-Chan +author: John Snow Labs +name: malicious_prompt_classifier +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`malicious_prompt_classifier` is a English model originally trained by Al-Chan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/malicious_prompt_classifier_en_5.5.0_3.0_1727012789878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/malicious_prompt_classifier_en_5.5.0_3.0_1727012789878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("malicious_prompt_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("malicious_prompt_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|malicious_prompt_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Al-Chan/Malicious_Prompt_Classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-malicious_prompt_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-malicious_prompt_classifier_pipeline_en.md new file mode 100644 index 00000000000000..75de87ed103d0f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-malicious_prompt_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English malicious_prompt_classifier_pipeline pipeline DistilBertForSequenceClassification from Al-Chan +author: John Snow Labs +name: malicious_prompt_classifier_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`malicious_prompt_classifier_pipeline` is a English model originally trained by Al-Chan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/malicious_prompt_classifier_pipeline_en_5.5.0_3.0_1727012801765.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/malicious_prompt_classifier_pipeline_en_5.5.0_3.0_1727012801765.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("malicious_prompt_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("malicious_prompt_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|malicious_prompt_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Al-Chan/Malicious_Prompt_Classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-marathi_sentiment_movie_reviews_mr.md b/docs/_posts/ahmedlone127/2024-09-22-marathi_sentiment_movie_reviews_mr.md new file mode 100644 index 00000000000000..894d475d830368 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-marathi_sentiment_movie_reviews_mr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Marathi marathi_sentiment_movie_reviews BertForSequenceClassification from l3cube-pune +author: John Snow Labs +name: marathi_sentiment_movie_reviews +date: 2024-09-22 +tags: [mr, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: mr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marathi_sentiment_movie_reviews` is a Marathi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marathi_sentiment_movie_reviews_mr_5.5.0_3.0_1727007645855.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marathi_sentiment_movie_reviews_mr_5.5.0_3.0_1727007645855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("marathi_sentiment_movie_reviews","mr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("marathi_sentiment_movie_reviews", "mr") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marathi_sentiment_movie_reviews| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|mr| +|Size:|892.8 MB| + +## References + +https://huggingface.co/l3cube-pune/marathi-sentiment-movie-reviews \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-md3_sentiment_analysis_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-md3_sentiment_analysis_pipeline_en.md new file mode 100644 index 00000000000000..e40450980c1c58 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-md3_sentiment_analysis_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English md3_sentiment_analysis_pipeline pipeline DistilBertForSequenceClassification from kassfir +author: John Snow Labs +name: md3_sentiment_analysis_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`md3_sentiment_analysis_pipeline` is a English model originally trained by kassfir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/md3_sentiment_analysis_pipeline_en_5.5.0_3.0_1727033269052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/md3_sentiment_analysis_pipeline_en_5.5.0_3.0_1727033269052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("md3_sentiment_analysis_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("md3_sentiment_analysis_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|md3_sentiment_analysis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kassfir/md3-sentiment-analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-minilmv2_l6_h384_mlm_multi_emails_hq_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-minilmv2_l6_h384_mlm_multi_emails_hq_pipeline_en.md new file mode 100644 index 00000000000000..81d0cd4cf814f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-minilmv2_l6_h384_mlm_multi_emails_hq_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English minilmv2_l6_h384_mlm_multi_emails_hq_pipeline pipeline RoBertaEmbeddings from postbot +author: John Snow Labs +name: minilmv2_l6_h384_mlm_multi_emails_hq_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`minilmv2_l6_h384_mlm_multi_emails_hq_pipeline` is a English model originally trained by postbot. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/minilmv2_l6_h384_mlm_multi_emails_hq_pipeline_en_5.5.0_3.0_1727042054048.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/minilmv2_l6_h384_mlm_multi_emails_hq_pipeline_en_5.5.0_3.0_1727042054048.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("minilmv2_l6_h384_mlm_multi_emails_hq_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("minilmv2_l6_h384_mlm_multi_emails_hq_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|minilmv2_l6_h384_mlm_multi_emails_hq_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|114.2 MB| + +## References + +https://huggingface.co/postbot/MiniLMv2-L6-H384-mlm-multi-emails-hq + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-mmarco_mminilmv2_l12_h384_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-mmarco_mminilmv2_l12_h384_v1_pipeline_en.md new file mode 100644 index 00000000000000..9050197fa7d548 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-mmarco_mminilmv2_l12_h384_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mmarco_mminilmv2_l12_h384_v1_pipeline pipeline XlmRoBertaForSequenceClassification from lpsantao +author: John Snow Labs +name: mmarco_mminilmv2_l12_h384_v1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mmarco_mminilmv2_l12_h384_v1_pipeline` is a English model originally trained by lpsantao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mmarco_mminilmv2_l12_h384_v1_pipeline_en_5.5.0_3.0_1727009979068.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mmarco_mminilmv2_l12_h384_v1_pipeline_en_5.5.0_3.0_1727009979068.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mmarco_mminilmv2_l12_h384_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mmarco_mminilmv2_l12_h384_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mmarco_mminilmv2_l12_h384_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|399.6 MB| + +## References + +https://huggingface.co/lpsantao/mmarco-mMiniLMv2-L12-H384-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-model_2_7_en.md b/docs/_posts/ahmedlone127/2024-09-22-model_2_7_en.md new file mode 100644 index 00000000000000..25b1c893bd5926 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-model_2_7_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_2_7 RoBertaForSequenceClassification from raydentseng +author: John Snow Labs +name: model_2_7 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_2_7` is a English model originally trained by raydentseng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_2_7_en_5.5.0_3.0_1727037745999.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_2_7_en_5.5.0_3.0_1727037745999.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("model_2_7","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("model_2_7", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_2_7| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|437.9 MB| + +## References + +https://huggingface.co/raydentseng/model_2_7 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-model_4_en.md b/docs/_posts/ahmedlone127/2024-09-22-model_4_en.md new file mode 100644 index 00000000000000..98fff5d1deae31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-model_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_4 BertForSequenceClassification from cannotbolt +author: John Snow Labs +name: model_4 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_4` is a English model originally trained by cannotbolt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_4_en_5.5.0_3.0_1727034784048.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_4_en_5.5.0_3.0_1727034784048.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("model_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("model_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/cannotbolt/model_4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-model_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-model_4_pipeline_en.md new file mode 100644 index 00000000000000..fa1f34cbc86f31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-model_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_4_pipeline pipeline BertForSequenceClassification from cannotbolt +author: John Snow Labs +name: model_4_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_4_pipeline` is a English model originally trained by cannotbolt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_4_pipeline_en_5.5.0_3.0_1727034804772.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_4_pipeline_en_5.5.0_3.0_1727034804772.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/cannotbolt/model_4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-model_token_classification_bert_base_ner_en.md b/docs/_posts/ahmedlone127/2024-09-22-model_token_classification_bert_base_ner_en.md new file mode 100644 index 00000000000000..9596e1218d5907 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-model_token_classification_bert_base_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_token_classification_bert_base_ner BertForTokenClassification from Ornelas7 +author: John Snow Labs +name: model_token_classification_bert_base_ner +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_token_classification_bert_base_ner` is a English model originally trained by Ornelas7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_token_classification_bert_base_ner_en_5.5.0_3.0_1727045793742.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_token_classification_bert_base_ner_en_5.5.0_3.0_1727045793742.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("model_token_classification_bert_base_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("model_token_classification_bert_base_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_token_classification_bert_base_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Ornelas7/model-token-classification-bert-base-NER \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-modelofine4_en.md b/docs/_posts/ahmedlone127/2024-09-22-modelofine4_en.md new file mode 100644 index 00000000000000..f0346789beeed3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-modelofine4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English modelofine4 RoBertaForSequenceClassification from adriansanz +author: John Snow Labs +name: modelofine4 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`modelofine4` is a English model originally trained by adriansanz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/modelofine4_en_5.5.0_3.0_1727027620273.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/modelofine4_en_5.5.0_3.0_1727027620273.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("modelofine4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("modelofine4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|modelofine4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|448.4 MB| + +## References + +https://huggingface.co/adriansanz/modelofine4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-modelofine4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-modelofine4_pipeline_en.md new file mode 100644 index 00000000000000..ecf6ae61ccc3ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-modelofine4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English modelofine4_pipeline pipeline RoBertaForSequenceClassification from adriansanz +author: John Snow Labs +name: modelofine4_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`modelofine4_pipeline` is a English model originally trained by adriansanz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/modelofine4_pipeline_en_5.5.0_3.0_1727027647545.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/modelofine4_pipeline_en_5.5.0_3.0_1727027647545.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("modelofine4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("modelofine4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|modelofine4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|448.4 MB| + +## References + +https://huggingface.co/adriansanz/modelofine4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-mpoclassification_en.md b/docs/_posts/ahmedlone127/2024-09-22-mpoclassification_en.md new file mode 100644 index 00000000000000..d6c89069d987a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-mpoclassification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mpoclassification DistilBertForSequenceClassification from inXistant +author: John Snow Labs +name: mpoclassification +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mpoclassification` is a English model originally trained by inXistant. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mpoclassification_en_5.5.0_3.0_1727033889376.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mpoclassification_en_5.5.0_3.0_1727033889376.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("mpoclassification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("mpoclassification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mpoclassification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/inXistant/MPOClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-multiwd_en.md b/docs/_posts/ahmedlone127/2024-09-22-multiwd_en.md new file mode 100644 index 00000000000000..fedb16934b1f35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-multiwd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English multiwd BertForSequenceClassification from Tianlin668 +author: John Snow Labs +name: multiwd +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multiwd` is a English model originally trained by Tianlin668. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multiwd_en_5.5.0_3.0_1727030510510.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multiwd_en_5.5.0_3.0_1727030510510.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("multiwd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("multiwd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multiwd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|408.8 MB| + +## References + +https://huggingface.co/Tianlin668/MultiWD \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-mysterious_bouncy_flan_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-mysterious_bouncy_flan_2_pipeline_en.md new file mode 100644 index 00000000000000..820297ffd5bf25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-mysterious_bouncy_flan_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mysterious_bouncy_flan_2_pipeline pipeline DistilBertForSequenceClassification from gaodrew +author: John Snow Labs +name: mysterious_bouncy_flan_2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mysterious_bouncy_flan_2_pipeline` is a English model originally trained by gaodrew. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mysterious_bouncy_flan_2_pipeline_en_5.5.0_3.0_1727012825109.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mysterious_bouncy_flan_2_pipeline_en_5.5.0_3.0_1727012825109.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mysterious_bouncy_flan_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mysterious_bouncy_flan_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mysterious_bouncy_flan_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gaodrew/mysterious-bouncy-flan-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_imdb_padding90model_en.md b/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_imdb_padding90model_en.md new file mode 100644 index 00000000000000..4d4138839bdaeb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_imdb_padding90model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_distilbert_imdb_padding90model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_imdb_padding90model +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_imdb_padding90model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_imdb_padding90model_en_5.5.0_3.0_1727013035647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_imdb_padding90model_en_5.5.0_3.0_1727013035647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_imdb_padding90model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_imdb_padding90model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_imdb_padding90model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_imdb_padding90model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_imdb_padding90model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_imdb_padding90model_pipeline_en.md new file mode 100644 index 00000000000000..b66eb61b62c96f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_imdb_padding90model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_distilbert_imdb_padding90model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_imdb_padding90model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_imdb_padding90model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_imdb_padding90model_pipeline_en_5.5.0_3.0_1727013048202.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_imdb_padding90model_pipeline_en_5.5.0_3.0_1727013048202.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_distilbert_imdb_padding90model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_distilbert_imdb_padding90model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_imdb_padding90model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_imdb_padding90model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_sst2_padding50model_en.md b/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_sst2_padding50model_en.md new file mode 100644 index 00000000000000..669f924ee82737 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_sst2_padding50model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_distilbert_sst2_padding50model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_sst2_padding50model +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_sst2_padding50model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_sst2_padding50model_en_5.5.0_3.0_1727020791054.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_sst2_padding50model_en_5.5.0_3.0_1727020791054.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_sst2_padding50model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_sst2_padding50model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_sst2_padding50model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_sst2_padding50model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_sst5_padding10model_realgon_en.md b/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_sst5_padding10model_realgon_en.md new file mode 100644 index 00000000000000..b6e415c98620aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-n_distilbert_sst5_padding10model_realgon_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_distilbert_sst5_padding10model_realgon DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_sst5_padding10model_realgon +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_sst5_padding10model_realgon` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding10model_realgon_en_5.5.0_3.0_1727033698893.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding10model_realgon_en_5.5.0_3.0_1727033698893.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_sst5_padding10model_realgon","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_sst5_padding10model_realgon", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_sst5_padding10model_realgon| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_sst5_padding10model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ner_ehr_spanish_model_mulitlingual_bert_en.md b/docs/_posts/ahmedlone127/2024-09-22-ner_ehr_spanish_model_mulitlingual_bert_en.md new file mode 100644 index 00000000000000..cb535d2285384d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ner_ehr_spanish_model_mulitlingual_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_ehr_spanish_model_mulitlingual_bert BertForTokenClassification from ajtamayoh +author: John Snow Labs +name: ner_ehr_spanish_model_mulitlingual_bert +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_ehr_spanish_model_mulitlingual_bert` is a English model originally trained by ajtamayoh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_ehr_spanish_model_mulitlingual_bert_en_5.5.0_3.0_1727040450475.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_ehr_spanish_model_mulitlingual_bert_en_5.5.0_3.0_1727040450475.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("ner_ehr_spanish_model_mulitlingual_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("ner_ehr_spanish_model_mulitlingual_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_ehr_spanish_model_mulitlingual_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/ajtamayoh/NER_EHR_Spanish_model_Mulitlingual_BERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ner_ehr_spanish_model_mulitlingual_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-ner_ehr_spanish_model_mulitlingual_bert_pipeline_en.md new file mode 100644 index 00000000000000..868545e6ac6f18 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ner_ehr_spanish_model_mulitlingual_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ner_ehr_spanish_model_mulitlingual_bert_pipeline pipeline BertForTokenClassification from ajtamayoh +author: John Snow Labs +name: ner_ehr_spanish_model_mulitlingual_bert_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_ehr_spanish_model_mulitlingual_bert_pipeline` is a English model originally trained by ajtamayoh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_ehr_spanish_model_mulitlingual_bert_pipeline_en_5.5.0_3.0_1727040484405.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_ehr_spanish_model_mulitlingual_bert_pipeline_en_5.5.0_3.0_1727040484405.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ner_ehr_spanish_model_mulitlingual_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ner_ehr_spanish_model_mulitlingual_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_ehr_spanish_model_mulitlingual_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/ajtamayoh/NER_EHR_Spanish_model_Mulitlingual_BERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ner_gec_roberta_v3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-ner_gec_roberta_v3_pipeline_en.md new file mode 100644 index 00000000000000..6d17f8cf4cc9f7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ner_gec_roberta_v3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ner_gec_roberta_v3_pipeline pipeline RoBertaForTokenClassification from fursov +author: John Snow Labs +name: ner_gec_roberta_v3_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_gec_roberta_v3_pipeline` is a English model originally trained by fursov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_gec_roberta_v3_pipeline_en_5.5.0_3.0_1727048557044.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_gec_roberta_v3_pipeline_en_5.5.0_3.0_1727048557044.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ner_gec_roberta_v3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ner_gec_roberta_v3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_gec_roberta_v3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|439.1 MB| + +## References + +https://huggingface.co/fursov/ner-gec-roberta-v3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ner_ner_random2_seed2_bernice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-ner_ner_random2_seed2_bernice_pipeline_en.md new file mode 100644 index 00000000000000..40856b705332ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ner_ner_random2_seed2_bernice_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ner_ner_random2_seed2_bernice_pipeline pipeline XlmRoBertaForTokenClassification from tweettemposhift +author: John Snow Labs +name: ner_ner_random2_seed2_bernice_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_ner_random2_seed2_bernice_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_ner_random2_seed2_bernice_pipeline_en_5.5.0_3.0_1726970022139.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_ner_random2_seed2_bernice_pipeline_en_5.5.0_3.0_1726970022139.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ner_ner_random2_seed2_bernice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ner_ner_random2_seed2_bernice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_ner_random2_seed2_bernice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|802.5 MB| + +## References + +https://huggingface.co/tweettemposhift/ner-ner_random2_seed2-bernice + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ner_serverstable_v0_en.md b/docs/_posts/ahmedlone127/2024-09-22-ner_serverstable_v0_en.md new file mode 100644 index 00000000000000..a95a650fc0661f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ner_serverstable_v0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_serverstable_v0 BertForTokenClassification from procit002 +author: John Snow Labs +name: ner_serverstable_v0 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_serverstable_v0` is a English model originally trained by procit002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_serverstable_v0_en_5.5.0_3.0_1727045435582.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_serverstable_v0_en_5.5.0_3.0_1727045435582.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("ner_serverstable_v0","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("ner_serverstable_v0", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_serverstable_v0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/procit002/NER_ServerStable_v0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-neurips_distilbert_combined_1_en.md b/docs/_posts/ahmedlone127/2024-09-22-neurips_distilbert_combined_1_en.md new file mode 100644 index 00000000000000..66b3289eb43bd2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-neurips_distilbert_combined_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English neurips_distilbert_combined_1 DistilBertForSequenceClassification from neurips-user +author: John Snow Labs +name: neurips_distilbert_combined_1 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`neurips_distilbert_combined_1` is a English model originally trained by neurips-user. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/neurips_distilbert_combined_1_en_5.5.0_3.0_1727020853525.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/neurips_distilbert_combined_1_en_5.5.0_3.0_1727020853525.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("neurips_distilbert_combined_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("neurips_distilbert_combined_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|neurips_distilbert_combined_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/neurips-user/neurips-distilbert-combined-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-neurips_distilbert_combined_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-neurips_distilbert_combined_1_pipeline_en.md new file mode 100644 index 00000000000000..c7ec04a8bb6ddd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-neurips_distilbert_combined_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English neurips_distilbert_combined_1_pipeline pipeline DistilBertForSequenceClassification from neurips-user +author: John Snow Labs +name: neurips_distilbert_combined_1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`neurips_distilbert_combined_1_pipeline` is a English model originally trained by neurips-user. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/neurips_distilbert_combined_1_pipeline_en_5.5.0_3.0_1727020865318.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/neurips_distilbert_combined_1_pipeline_en_5.5.0_3.0_1727020865318.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("neurips_distilbert_combined_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("neurips_distilbert_combined_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|neurips_distilbert_combined_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/neurips-user/neurips-distilbert-combined-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_pipeline_en.md new file mode 100644 index 00000000000000..e71e3b112467c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_pipeline pipeline RoBertaForSequenceClassification from NinjaBanana1 +author: John Snow Labs +name: nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_pipeline` is a English model originally trained by NinjaBanana1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_pipeline_en_5.5.0_3.0_1726972020072.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_pipeline_en_5.5.0_3.0_1726972020072.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nli_roberta_base_finetuned_for_amazon_review_ratings_ninjabanana1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/NinjaBanana1/nli-roberta-base-finetuned-for-amazon-review-ratings + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-nlp_hf_workshop2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-nlp_hf_workshop2_pipeline_en.md new file mode 100644 index 00000000000000..d7345173374c8d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-nlp_hf_workshop2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nlp_hf_workshop2_pipeline pipeline DistilBertForSequenceClassification from Bahareh0281 +author: John Snow Labs +name: nlp_hf_workshop2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_hf_workshop2_pipeline` is a English model originally trained by Bahareh0281. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_hf_workshop2_pipeline_en_5.5.0_3.0_1726980539416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_hf_workshop2_pipeline_en_5.5.0_3.0_1726980539416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nlp_hf_workshop2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nlp_hf_workshop2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_hf_workshop2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/Bahareh0281/NLP_HF_Workshop2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-p_model_2_en.md b/docs/_posts/ahmedlone127/2024-09-22-p_model_2_en.md new file mode 100644 index 00000000000000..7aa5e6392e2b49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-p_model_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English p_model_2 DistilBertForSequenceClassification from Habaznya +author: John Snow Labs +name: p_model_2 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`p_model_2` is a English model originally trained by Habaznya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/p_model_2_en_5.5.0_3.0_1727012560507.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/p_model_2_en_5.5.0_3.0_1727012560507.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("p_model_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("p_model_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|p_model_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/Habaznya/p_model_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-paludistilbertpartaugmentedoriginal_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-paludistilbertpartaugmentedoriginal_pipeline_en.md new file mode 100644 index 00000000000000..1b75d928934e8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-paludistilbertpartaugmentedoriginal_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English paludistilbertpartaugmentedoriginal_pipeline pipeline DistilBertForSequenceClassification from Palu001 +author: John Snow Labs +name: paludistilbertpartaugmentedoriginal_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`paludistilbertpartaugmentedoriginal_pipeline` is a English model originally trained by Palu001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/paludistilbertpartaugmentedoriginal_pipeline_en_5.5.0_3.0_1726980236573.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/paludistilbertpartaugmentedoriginal_pipeline_en_5.5.0_3.0_1726980236573.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("paludistilbertpartaugmentedoriginal_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("paludistilbertpartaugmentedoriginal_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|paludistilbertpartaugmentedoriginal_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Palu001/PaluDistilbertPartAugmentedOriginal + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-panamianmodel_en.md b/docs/_posts/ahmedlone127/2024-09-22-panamianmodel_en.md new file mode 100644 index 00000000000000..ccce7ffe864284 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-panamianmodel_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English panamianmodel RoBertaEmbeddings from joseangelatm +author: John Snow Labs +name: panamianmodel +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`panamianmodel` is a English model originally trained by joseangelatm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/panamianmodel_en_5.5.0_3.0_1727041601341.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/panamianmodel_en_5.5.0_3.0_1727041601341.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("panamianmodel","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("panamianmodel","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|panamianmodel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.4 MB| + +## References + +https://huggingface.co/joseangelatm/PanamianModel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-panamianmodel_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-panamianmodel_pipeline_en.md new file mode 100644 index 00000000000000..13defd3cfa4e55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-panamianmodel_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English panamianmodel_pipeline pipeline RoBertaEmbeddings from joseangelatm +author: John Snow Labs +name: panamianmodel_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`panamianmodel_pipeline` is a English model originally trained by joseangelatm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/panamianmodel_pipeline_en_5.5.0_3.0_1727041624936.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/panamianmodel_pipeline_en_5.5.0_3.0_1727041624936.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("panamianmodel_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("panamianmodel_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|panamianmodel_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.4 MB| + +## References + +https://huggingface.co/joseangelatm/PanamianModel + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-persian_text_emotion_bert_v1_pipeline_fa.md b/docs/_posts/ahmedlone127/2024-09-22-persian_text_emotion_bert_v1_pipeline_fa.md new file mode 100644 index 00000000000000..a62fe6a28fb298 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-persian_text_emotion_bert_v1_pipeline_fa.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Persian persian_text_emotion_bert_v1_pipeline pipeline BertForSequenceClassification from SeyedAli +author: John Snow Labs +name: persian_text_emotion_bert_v1_pipeline +date: 2024-09-22 +tags: [fa, open_source, pipeline, onnx] +task: Text Classification +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`persian_text_emotion_bert_v1_pipeline` is a Persian model originally trained by SeyedAli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/persian_text_emotion_bert_v1_pipeline_fa_5.5.0_3.0_1726988688390.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/persian_text_emotion_bert_v1_pipeline_fa_5.5.0_3.0_1726988688390.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("persian_text_emotion_bert_v1_pipeline", lang = "fa") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("persian_text_emotion_bert_v1_pipeline", lang = "fa") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|persian_text_emotion_bert_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fa| +|Size:|608.7 MB| + +## References + +https://huggingface.co/SeyedAli/Persian-Text-Emotion-Bert-V1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-platzi_distilroberta_base_mrpc_glue_angrim_en.md b/docs/_posts/ahmedlone127/2024-09-22-platzi_distilroberta_base_mrpc_glue_angrim_en.md new file mode 100644 index 00000000000000..b9c9f5e8779113 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-platzi_distilroberta_base_mrpc_glue_angrim_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English platzi_distilroberta_base_mrpc_glue_angrim RoBertaForSequenceClassification from platzi +author: John Snow Labs +name: platzi_distilroberta_base_mrpc_glue_angrim +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_mrpc_glue_angrim` is a English model originally trained by platzi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_angrim_en_5.5.0_3.0_1727027149342.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_angrim_en_5.5.0_3.0_1727027149342.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_angrim","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_mrpc_glue_angrim", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_mrpc_glue_angrim| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/platzi/platzi-distilroberta-base-mrpc-glue-angrim \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-platzi_distilroberta_base_mrpc_glue_ricardo_talavera_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-platzi_distilroberta_base_mrpc_glue_ricardo_talavera_pipeline_en.md new file mode 100644 index 00000000000000..0d760bdf7bda2d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-platzi_distilroberta_base_mrpc_glue_ricardo_talavera_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English platzi_distilroberta_base_mrpc_glue_ricardo_talavera_pipeline pipeline RoBertaForSequenceClassification from ricardotalavera +author: John Snow Labs +name: platzi_distilroberta_base_mrpc_glue_ricardo_talavera_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_mrpc_glue_ricardo_talavera_pipeline` is a English model originally trained by ricardotalavera. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_ricardo_talavera_pipeline_en_5.5.0_3.0_1727026782855.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_mrpc_glue_ricardo_talavera_pipeline_en_5.5.0_3.0_1727026782855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("platzi_distilroberta_base_mrpc_glue_ricardo_talavera_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("platzi_distilroberta_base_mrpc_glue_ricardo_talavera_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_mrpc_glue_ricardo_talavera_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/ricardotalavera/platzi-distilroberta-base-mrpc-glue-ricardo-talavera + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-proposed_mediumf_model_en.md b/docs/_posts/ahmedlone127/2024-09-22-proposed_mediumf_model_en.md new file mode 100644 index 00000000000000..ede5721ffc3d62 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-proposed_mediumf_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English proposed_mediumf_model RoBertaEmbeddings from athar +author: John Snow Labs +name: proposed_mediumf_model +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`proposed_mediumf_model` is a English model originally trained by athar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/proposed_mediumf_model_en_5.5.0_3.0_1727041549211.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/proposed_mediumf_model_en_5.5.0_3.0_1727041549211.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("proposed_mediumf_model","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("proposed_mediumf_model","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|proposed_mediumf_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|364.7 MB| + +## References + +https://huggingface.co/athar/proposed_MEDIUMF-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ptcrawl_plus_legal_base_v3_5__checkpoint_last_en.md b/docs/_posts/ahmedlone127/2024-09-22-ptcrawl_plus_legal_base_v3_5__checkpoint_last_en.md new file mode 100644 index 00000000000000..3c4daca7052bfc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ptcrawl_plus_legal_base_v3_5__checkpoint_last_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ptcrawl_plus_legal_base_v3_5__checkpoint_last RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: ptcrawl_plus_legal_base_v3_5__checkpoint_last +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ptcrawl_plus_legal_base_v3_5__checkpoint_last` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ptcrawl_plus_legal_base_v3_5__checkpoint_last_en_5.5.0_3.0_1727041842066.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ptcrawl_plus_legal_base_v3_5__checkpoint_last_en_5.5.0_3.0_1727041842066.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("ptcrawl_plus_legal_base_v3_5__checkpoint_last","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("ptcrawl_plus_legal_base_v3_5__checkpoint_last","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ptcrawl_plus_legal_base_v3_5__checkpoint_last| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|296.7 MB| + +## References + +https://huggingface.co/eduagarcia-temp/ptcrawl_plus_legal_base_v3_5__checkpoint_last \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-qa_bert_base_multilingual_cased_finetuned_squad_xx.md b/docs/_posts/ahmedlone127/2024-09-22-qa_bert_base_multilingual_cased_finetuned_squad_xx.md new file mode 100644 index 00000000000000..853146ef143518 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-qa_bert_base_multilingual_cased_finetuned_squad_xx.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Multilingual qa_bert_base_multilingual_cased_finetuned_squad BertForQuestionAnswering from itsamitkumar +author: John Snow Labs +name: qa_bert_base_multilingual_cased_finetuned_squad +date: 2024-09-22 +tags: [xx, open_source, onnx, question_answering, bert] +task: Question Answering +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_bert_base_multilingual_cased_finetuned_squad` is a Multilingual model originally trained by itsamitkumar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_bert_base_multilingual_cased_finetuned_squad_xx_5.5.0_3.0_1727049211091.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_bert_base_multilingual_cased_finetuned_squad_xx_5.5.0_3.0_1727049211091.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("qa_bert_base_multilingual_cased_finetuned_squad","xx") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("qa_bert_base_multilingual_cased_finetuned_squad", "xx") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_bert_base_multilingual_cased_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|xx| +|Size:|665.0 MB| + +## References + +https://huggingface.co/itsamitkumar/qa_bert-base-multilingual-cased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-qnli_roberta_base_seed_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-qnli_roberta_base_seed_3_pipeline_en.md new file mode 100644 index 00000000000000..ea1e623259d30e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-qnli_roberta_base_seed_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English qnli_roberta_base_seed_3_pipeline pipeline RoBertaForSequenceClassification from utahnlp +author: John Snow Labs +name: qnli_roberta_base_seed_3_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qnli_roberta_base_seed_3_pipeline` is a English model originally trained by utahnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qnli_roberta_base_seed_3_pipeline_en_5.5.0_3.0_1727037198802.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qnli_roberta_base_seed_3_pipeline_en_5.5.0_3.0_1727037198802.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("qnli_roberta_base_seed_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("qnli_roberta_base_seed_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qnli_roberta_base_seed_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|461.6 MB| + +## References + +https://huggingface.co/utahnlp/qnli_roberta-base_seed-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-reward_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-reward_model_pipeline_en.md new file mode 100644 index 00000000000000..d342c6078adfaa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-reward_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English reward_model_pipeline pipeline RoBertaForSequenceClassification from lillybak +author: John Snow Labs +name: reward_model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`reward_model_pipeline` is a English model originally trained by lillybak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/reward_model_pipeline_en_5.5.0_3.0_1727037852561.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/reward_model_pipeline_en_5.5.0_3.0_1727037852561.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("reward_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("reward_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|reward_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.1 MB| + +## References + +https://huggingface.co/lillybak/reward_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-rl_grp_prj_per_cls_en.md b/docs/_posts/ahmedlone127/2024-09-22-rl_grp_prj_per_cls_en.md new file mode 100644 index 00000000000000..79b23bf22f302e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-rl_grp_prj_per_cls_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English rl_grp_prj_per_cls RoBertaForSequenceClassification from nasheed +author: John Snow Labs +name: rl_grp_prj_per_cls +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rl_grp_prj_per_cls` is a English model originally trained by nasheed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rl_grp_prj_per_cls_en_5.5.0_3.0_1727016878903.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rl_grp_prj_per_cls_en_5.5.0_3.0_1727016878903.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("rl_grp_prj_per_cls","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("rl_grp_prj_per_cls", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rl_grp_prj_per_cls| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|422.0 MB| + +## References + +https://huggingface.co/nasheed/rl-grp-prj-per-cls \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-rl_grp_prj_per_cls_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-rl_grp_prj_per_cls_pipeline_en.md new file mode 100644 index 00000000000000..78cfd119d4b777 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-rl_grp_prj_per_cls_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English rl_grp_prj_per_cls_pipeline pipeline RoBertaForSequenceClassification from nasheed +author: John Snow Labs +name: rl_grp_prj_per_cls_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rl_grp_prj_per_cls_pipeline` is a English model originally trained by nasheed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rl_grp_prj_per_cls_pipeline_en_5.5.0_3.0_1727016918434.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rl_grp_prj_per_cls_pipeline_en_5.5.0_3.0_1727016918434.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rl_grp_prj_per_cls_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rl_grp_prj_per_cls_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rl_grp_prj_per_cls_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|422.1 MB| + +## References + +https://huggingface.co/nasheed/rl-grp-prj-per-cls + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-robbert_emotions_en.md b/docs/_posts/ahmedlone127/2024-09-22-robbert_emotions_en.md new file mode 100644 index 00000000000000..801cd7020186cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-robbert_emotions_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robbert_emotions RoBertaForSequenceClassification from rroell +author: John Snow Labs +name: robbert_emotions +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_emotions` is a English model originally trained by rroell. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_emotions_en_5.5.0_3.0_1727026865707.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_emotions_en_5.5.0_3.0_1727026865707.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("robbert_emotions","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("robbert_emotions", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_emotions| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|437.9 MB| + +## References + +https://huggingface.co/rroell/RoBBERT-emotions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_9_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_9_pipeline_en.md new file mode 100644 index 00000000000000..2c1c1a7ed5403b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_9_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_9_pipeline pipeline RoBertaForSequenceClassification from mollypak +author: John Snow Labs +name: roberta_9_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_9_pipeline` is a English model originally trained by mollypak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_9_pipeline_en_5.5.0_3.0_1726972304265.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_9_pipeline_en_5.5.0_3.0_1726972304265.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_9_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_9_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_9_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|422.9 MB| + +## References + +https://huggingface.co/mollypak/roberta_9 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_augmented_finetuned_atis_5pct_v1_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_augmented_finetuned_atis_5pct_v1_en.md new file mode 100644 index 00000000000000..fa6f745efba293 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_augmented_finetuned_atis_5pct_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_augmented_finetuned_atis_5pct_v1 RoBertaForSequenceClassification from benayas +author: John Snow Labs +name: roberta_augmented_finetuned_atis_5pct_v1 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_augmented_finetuned_atis_5pct_v1` is a English model originally trained by benayas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_augmented_finetuned_atis_5pct_v1_en_5.5.0_3.0_1727027540028.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_augmented_finetuned_atis_5pct_v1_en_5.5.0_3.0_1727027540028.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_augmented_finetuned_atis_5pct_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_augmented_finetuned_atis_5pct_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_augmented_finetuned_atis_5pct_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|425.6 MB| + +## References + +https://huggingface.co/benayas/roberta-augmented-finetuned-atis_5pct_v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_augmented_finetuned_atis_5pct_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_augmented_finetuned_atis_5pct_v1_pipeline_en.md new file mode 100644 index 00000000000000..09663f8de86b77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_augmented_finetuned_atis_5pct_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_augmented_finetuned_atis_5pct_v1_pipeline pipeline RoBertaForSequenceClassification from benayas +author: John Snow Labs +name: roberta_augmented_finetuned_atis_5pct_v1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_augmented_finetuned_atis_5pct_v1_pipeline` is a English model originally trained by benayas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_augmented_finetuned_atis_5pct_v1_pipeline_en_5.5.0_3.0_1727027578661.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_augmented_finetuned_atis_5pct_v1_pipeline_en_5.5.0_3.0_1727027578661.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_augmented_finetuned_atis_5pct_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_augmented_finetuned_atis_5pct_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_augmented_finetuned_atis_5pct_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|425.6 MB| + +## References + +https://huggingface.co/benayas/roberta-augmented-finetuned-atis_5pct_v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_babe_3epochs_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_babe_3epochs_en.md new file mode 100644 index 00000000000000..676eb20c52efb4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_babe_3epochs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_babe_3epochs RoBertaForSequenceClassification from jordankrishnayah +author: John Snow Labs +name: roberta_babe_3epochs +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_babe_3epochs` is a English model originally trained by jordankrishnayah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_babe_3epochs_en_5.5.0_3.0_1727017645046.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_babe_3epochs_en_5.5.0_3.0_1727017645046.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_babe_3epochs","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_babe_3epochs", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_babe_3epochs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|438.4 MB| + +## References + +https://huggingface.co/jordankrishnayah/ROBERTA-BABE-3epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_babe_3epochs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_babe_3epochs_pipeline_en.md new file mode 100644 index 00000000000000..bfe5aa4b0d5e6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_babe_3epochs_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_babe_3epochs_pipeline pipeline RoBertaForSequenceClassification from jordankrishnayah +author: John Snow Labs +name: roberta_babe_3epochs_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_babe_3epochs_pipeline` is a English model originally trained by jordankrishnayah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_babe_3epochs_pipeline_en_5.5.0_3.0_1727017673953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_babe_3epochs_pipeline_en_5.5.0_3.0_1727017673953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_babe_3epochs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_babe_3epochs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_babe_3epochs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.4 MB| + +## References + +https://huggingface.co/jordankrishnayah/ROBERTA-BABE-3epochs + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_airlines_news_binary_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_airlines_news_binary_en.md new file mode 100644 index 00000000000000..4c131a00dd12e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_airlines_news_binary_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_airlines_news_binary RoBertaForSequenceClassification from dahe827 +author: John Snow Labs +name: roberta_base_airlines_news_binary +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_airlines_news_binary` is a English model originally trained by dahe827. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_airlines_news_binary_en_5.5.0_3.0_1727036976097.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_airlines_news_binary_en_5.5.0_3.0_1727036976097.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_airlines_news_binary","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_airlines_news_binary", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_airlines_news_binary| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|438.4 MB| + +## References + +https://huggingface.co/dahe827/roberta-base-airlines-news-binary \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_airlines_news_binary_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_airlines_news_binary_pipeline_en.md new file mode 100644 index 00000000000000..c15045e86d2466 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_airlines_news_binary_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_airlines_news_binary_pipeline pipeline RoBertaForSequenceClassification from dahe827 +author: John Snow Labs +name: roberta_base_airlines_news_binary_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_airlines_news_binary_pipeline` is a English model originally trained by dahe827. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_airlines_news_binary_pipeline_en_5.5.0_3.0_1727037015137.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_airlines_news_binary_pipeline_en_5.5.0_3.0_1727037015137.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_airlines_news_binary_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_airlines_news_binary_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_airlines_news_binary_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.4 MB| + +## References + +https://huggingface.co/dahe827/roberta-base-airlines-news-binary + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_en.md new file mode 100644 index 00000000000000..d1e8b748df5a75 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb RoBertaForSequenceClassification from pamelapaolacb +author: John Snow Labs +name: roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb` is a English model originally trained by pamelapaolacb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_en_5.5.0_3.0_1726972309460.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_en_5.5.0_3.0_1726972309460.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|446.8 MB| + +## References + +https://huggingface.co/pamelapaolacb/roberta-base-bne-jou-amazon_reviews_multi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_cased_finetuned_mnli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_cased_finetuned_mnli_pipeline_en.md new file mode 100644 index 00000000000000..443f11b0688b0d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_cased_finetuned_mnli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_cased_finetuned_mnli_pipeline pipeline RoBertaForSequenceClassification from George-Ogden +author: John Snow Labs +name: roberta_base_cased_finetuned_mnli_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_cased_finetuned_mnli_pipeline` is a English model originally trained by George-Ogden. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_cased_finetuned_mnli_pipeline_en_5.5.0_3.0_1727017454794.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_cased_finetuned_mnli_pipeline_en_5.5.0_3.0_1727017454794.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_cased_finetuned_mnli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_cased_finetuned_mnli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_cased_finetuned_mnli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|462.5 MB| + +## References + +https://huggingface.co/George-Ogden/roberta-base-cased-finetuned-mnli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_dark_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_dark_pipeline_en.md new file mode 100644 index 00000000000000..3d60abc7342a10 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_dark_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_dark_pipeline pipeline RoBertaForSequenceClassification from geektech +author: John Snow Labs +name: roberta_base_finetuned_dark_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_dark_pipeline` is a English model originally trained by geektech. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_dark_pipeline_en_5.5.0_3.0_1727026475926.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_dark_pipeline_en_5.5.0_3.0_1727026475926.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_dark_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_dark_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_dark_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|423.3 MB| + +## References + +https://huggingface.co/geektech/roberta-base-finetuned-dark + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_mnli_kuaaangwen_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_mnli_kuaaangwen_en.md new file mode 100644 index 00000000000000..18c8b26b59ac0f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_mnli_kuaaangwen_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_mnli_kuaaangwen RoBertaForSequenceClassification from Kuaaangwen +author: John Snow Labs +name: roberta_base_finetuned_mnli_kuaaangwen +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_mnli_kuaaangwen` is a English model originally trained by Kuaaangwen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_mnli_kuaaangwen_en_5.5.0_3.0_1727017592170.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_mnli_kuaaangwen_en_5.5.0_3.0_1727017592170.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_finetuned_mnli_kuaaangwen","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_finetuned_mnli_kuaaangwen", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_mnli_kuaaangwen| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|465.5 MB| + +## References + +https://huggingface.co/Kuaaangwen/roberta-base-finetuned-mnli \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_mnli_kuaaangwen_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_mnli_kuaaangwen_pipeline_en.md new file mode 100644 index 00000000000000..da9136630c384e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_mnli_kuaaangwen_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_mnli_kuaaangwen_pipeline pipeline RoBertaForSequenceClassification from Kuaaangwen +author: John Snow Labs +name: roberta_base_finetuned_mnli_kuaaangwen_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_mnli_kuaaangwen_pipeline` is a English model originally trained by Kuaaangwen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_mnli_kuaaangwen_pipeline_en_5.5.0_3.0_1727017614208.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_mnli_kuaaangwen_pipeline_en_5.5.0_3.0_1727017614208.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_mnli_kuaaangwen_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_mnli_kuaaangwen_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_mnli_kuaaangwen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.5 MB| + +## References + +https://huggingface.co/Kuaaangwen/roberta-base-finetuned-mnli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_hoax_classifier_defs_1h2r_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_hoax_classifier_defs_1h2r_en.md new file mode 100644 index 00000000000000..6037cc86cc731a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_hoax_classifier_defs_1h2r_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_hoax_classifier_defs_1h2r RoBertaForSequenceClassification from research-dump +author: John Snow Labs +name: roberta_base_hoax_classifier_defs_1h2r +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_hoax_classifier_defs_1h2r` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_hoax_classifier_defs_1h2r_en_5.5.0_3.0_1727037375415.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_hoax_classifier_defs_1h2r_en_5.5.0_3.0_1727037375415.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_hoax_classifier_defs_1h2r","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_hoax_classifier_defs_1h2r", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_hoax_classifier_defs_1h2r| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|432.6 MB| + +## References + +https://huggingface.co/research-dump/roberta-base_hoax_classifier_defs_1h2r \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_md_gender_bias_trained_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_md_gender_bias_trained_en.md new file mode 100644 index 00000000000000..7530e1a61be7c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_md_gender_bias_trained_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_md_gender_bias_trained RoBertaForSequenceClassification from JakobKaiser +author: John Snow Labs +name: roberta_base_md_gender_bias_trained +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_md_gender_bias_trained` is a English model originally trained by JakobKaiser. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_md_gender_bias_trained_en_5.5.0_3.0_1727017094626.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_md_gender_bias_trained_en_5.5.0_3.0_1727017094626.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_md_gender_bias_trained","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_md_gender_bias_trained", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_md_gender_bias_trained| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|436.0 MB| + +## References + +https://huggingface.co/JakobKaiser/roberta-base-md_gender_bias-trained \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_oscar_chen_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_oscar_chen_en.md new file mode 100644 index 00000000000000..59c30351e5f3c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_oscar_chen_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_oscar_chen RoBertaForSequenceClassification from Oscar-chen +author: John Snow Labs +name: roberta_base_oscar_chen +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_oscar_chen` is a English model originally trained by Oscar-chen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_oscar_chen_en_5.5.0_3.0_1726972171071.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_oscar_chen_en_5.5.0_3.0_1726972171071.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_oscar_chen","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_oscar_chen", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_oscar_chen| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|433.1 MB| + +## References + +https://huggingface.co/Oscar-chen/roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_philpapers_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_philpapers_en.md new file mode 100644 index 00000000000000..d02ccd8a5cec47 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_philpapers_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_philpapers RoBertaEmbeddings from Dhruvil47 +author: John Snow Labs +name: roberta_base_philpapers +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_philpapers` is a English model originally trained by Dhruvil47. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_philpapers_en_5.5.0_3.0_1726999716378.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_philpapers_en_5.5.0_3.0_1726999716378.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_philpapers","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_philpapers","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_philpapers| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/Dhruvil47/roberta-base-philpapers \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_philpapers_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_philpapers_pipeline_en.md new file mode 100644 index 00000000000000..52ab5efce4b4dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_philpapers_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_philpapers_pipeline pipeline RoBertaEmbeddings from Dhruvil47 +author: John Snow Labs +name: roberta_base_philpapers_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_philpapers_pipeline` is a English model originally trained by Dhruvil47. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_philpapers_pipeline_en_5.5.0_3.0_1726999737210.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_philpapers_pipeline_en_5.5.0_3.0_1726999737210.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_philpapers_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_philpapers_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_philpapers_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/Dhruvil47/roberta-base-philpapers + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_plausibility_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_plausibility_pipeline_en.md new file mode 100644 index 00000000000000..17dacec546c471 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_plausibility_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_plausibility_pipeline pipeline RoBertaForSequenceClassification from ianporada +author: John Snow Labs +name: roberta_base_plausibility_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_plausibility_pipeline` is a English model originally trained by ianporada. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_plausibility_pipeline_en_5.5.0_3.0_1727026481622.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_plausibility_pipeline_en_5.5.0_3.0_1727026481622.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_plausibility_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_plausibility_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_plausibility_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|420.4 MB| + +## References + +https://huggingface.co/ianporada/roberta_base_plausibility + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_reduced_upper_fabric_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_reduced_upper_fabric_pipeline_en.md new file mode 100644 index 00000000000000..605444a6c08441 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_reduced_upper_fabric_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_reduced_upper_fabric_pipeline pipeline RoBertaForSequenceClassification from Cournane +author: John Snow Labs +name: roberta_base_reduced_upper_fabric_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_reduced_upper_fabric_pipeline` is a English model originally trained by Cournane. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_reduced_upper_fabric_pipeline_en_5.5.0_3.0_1727017758021.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_reduced_upper_fabric_pipeline_en_5.5.0_3.0_1727017758021.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_reduced_upper_fabric_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_reduced_upper_fabric_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_reduced_upper_fabric_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|427.2 MB| + +## References + +https://huggingface.co/Cournane/roberta-base-reduced-Upper_fabric + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_turkish_uncased_stance_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_turkish_uncased_stance_pipeline_tr.md new file mode 100644 index 00000000000000..55b7ebc26b995d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_turkish_uncased_stance_pipeline_tr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Turkish roberta_base_turkish_uncased_stance_pipeline pipeline RoBertaForSequenceClassification from byunal +author: John Snow Labs +name: roberta_base_turkish_uncased_stance_pipeline +date: 2024-09-22 +tags: [tr, open_source, pipeline, onnx] +task: Text Classification +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_turkish_uncased_stance_pipeline` is a Turkish model originally trained by byunal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_turkish_uncased_stance_pipeline_tr_5.5.0_3.0_1727017395547.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_turkish_uncased_stance_pipeline_tr_5.5.0_3.0_1727017395547.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_turkish_uncased_stance_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_turkish_uncased_stance_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_turkish_uncased_stance_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|463.7 MB| + +## References + +https://huggingface.co/byunal/roberta-base-turkish-uncased-stance + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_turkish_uncased_stance_tr.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_turkish_uncased_stance_tr.md new file mode 100644 index 00000000000000..f16c34d53b5502 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_turkish_uncased_stance_tr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Turkish roberta_base_turkish_uncased_stance RoBertaForSequenceClassification from byunal +author: John Snow Labs +name: roberta_base_turkish_uncased_stance +date: 2024-09-22 +tags: [tr, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_turkish_uncased_stance` is a Turkish model originally trained by byunal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_turkish_uncased_stance_tr_5.5.0_3.0_1727017374018.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_turkish_uncased_stance_tr_5.5.0_3.0_1727017374018.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_turkish_uncased_stance","tr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_turkish_uncased_stance", "tr") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_turkish_uncased_stance| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|tr| +|Size:|463.7 MB| + +## References + +https://huggingface.co/byunal/roberta-base-turkish-uncased-stance \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_large_mnli_model3_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_large_mnli_model3_en.md new file mode 100644 index 00000000000000..2e1127737b76ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_large_mnli_model3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_mnli_model3 RoBertaForSequenceClassification from varun-v-rao +author: John Snow Labs +name: roberta_large_mnli_model3 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_mnli_model3` is a English model originally trained by varun-v-rao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_mnli_model3_en_5.5.0_3.0_1727038017764.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_mnli_model3_en_5.5.0_3.0_1727038017764.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_mnli_model3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_mnli_model3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_mnli_model3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/varun-v-rao/roberta-large-mnli-model3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_large_sst_2_16_13_smoothed_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_large_sst_2_16_13_smoothed_en.md new file mode 100644 index 00000000000000..0d35cd12f1441d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_large_sst_2_16_13_smoothed_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_sst_2_16_13_smoothed RoBertaForSequenceClassification from simonycl +author: John Snow Labs +name: roberta_large_sst_2_16_13_smoothed +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_sst_2_16_13_smoothed` is a English model originally trained by simonycl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_sst_2_16_13_smoothed_en_5.5.0_3.0_1726972443205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_sst_2_16_13_smoothed_en_5.5.0_3.0_1726972443205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_sst_2_16_13_smoothed","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_sst_2_16_13_smoothed", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_sst_2_16_13_smoothed| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/simonycl/roberta-large-sst-2-16-13-smoothed \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_large_temp_classifier_v2_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_large_temp_classifier_v2_en.md new file mode 100644 index 00000000000000..1f7e9c0f500b98 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_large_temp_classifier_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_temp_classifier_v2 RoBertaForSequenceClassification from research-dump +author: John Snow Labs +name: roberta_large_temp_classifier_v2 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_temp_classifier_v2` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_temp_classifier_v2_en_5.5.0_3.0_1727017028599.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_temp_classifier_v2_en_5.5.0_3.0_1727017028599.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_temp_classifier_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_temp_classifier_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_temp_classifier_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/research-dump/roberta_large_temp_classifier_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_large_temp_classifier_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_large_temp_classifier_v2_pipeline_en.md new file mode 100644 index 00000000000000..008280336bbb9f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_large_temp_classifier_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_temp_classifier_v2_pipeline pipeline RoBertaForSequenceClassification from research-dump +author: John Snow Labs +name: roberta_large_temp_classifier_v2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_temp_classifier_v2_pipeline` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_temp_classifier_v2_pipeline_en_5.5.0_3.0_1727017097486.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_temp_classifier_v2_pipeline_en_5.5.0_3.0_1727017097486.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_temp_classifier_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_temp_classifier_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_temp_classifier_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/research-dump/roberta_large_temp_classifier_v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_model_sst2_babylm_challenge_strict_small_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_model_sst2_babylm_challenge_strict_small_en.md new file mode 100644 index 00000000000000..f2207c63cf74b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_model_sst2_babylm_challenge_strict_small_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_model_sst2_babylm_challenge_strict_small RoBertaForSequenceClassification from TheBguy87 +author: John Snow Labs +name: roberta_model_sst2_babylm_challenge_strict_small +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_model_sst2_babylm_challenge_strict_small` is a English model originally trained by TheBguy87. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_model_sst2_babylm_challenge_strict_small_en_5.5.0_3.0_1727037298360.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_model_sst2_babylm_challenge_strict_small_en_5.5.0_3.0_1727037298360.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_model_sst2_babylm_challenge_strict_small","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_model_sst2_babylm_challenge_strict_small", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_model_sst2_babylm_challenge_strict_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|313.7 MB| + +## References + +https://huggingface.co/TheBguy87/roBERTa-Model-sst2-BabyLM-Challenge-Strict-Small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_model_sst2_babylm_challenge_strict_small_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_model_sst2_babylm_challenge_strict_small_pipeline_en.md new file mode 100644 index 00000000000000..a959fd5e6862cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_model_sst2_babylm_challenge_strict_small_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_model_sst2_babylm_challenge_strict_small_pipeline pipeline RoBertaForSequenceClassification from TheBguy87 +author: John Snow Labs +name: roberta_model_sst2_babylm_challenge_strict_small_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_model_sst2_babylm_challenge_strict_small_pipeline` is a English model originally trained by TheBguy87. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_model_sst2_babylm_challenge_strict_small_pipeline_en_5.5.0_3.0_1727037316848.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_model_sst2_babylm_challenge_strict_small_pipeline_en_5.5.0_3.0_1727037316848.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_model_sst2_babylm_challenge_strict_small_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_model_sst2_babylm_challenge_strict_small_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_model_sst2_babylm_challenge_strict_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|313.7 MB| + +## References + +https://huggingface.co/TheBguy87/roBERTa-Model-sst2-BabyLM-Challenge-Strict-Small + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_sentiment_classification_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_sentiment_classification_en.md new file mode 100644 index 00000000000000..5f1b6da3b83bb3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_sentiment_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_sentiment_classification RoBertaForSequenceClassification from newsmediabias +author: John Snow Labs +name: roberta_sentiment_classification +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_sentiment_classification` is a English model originally trained by newsmediabias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_sentiment_classification_en_5.5.0_3.0_1727036898297.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_sentiment_classification_en_5.5.0_3.0_1727036898297.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_sentiment_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_sentiment_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_sentiment_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|450.9 MB| + +## References + +https://huggingface.co/newsmediabias/Roberta_Sentiment_Classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_sentiment_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_sentiment_classification_pipeline_en.md new file mode 100644 index 00000000000000..65e0eee659292a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_sentiment_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_sentiment_classification_pipeline pipeline RoBertaForSequenceClassification from newsmediabias +author: John Snow Labs +name: roberta_sentiment_classification_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_sentiment_classification_pipeline` is a English model originally trained by newsmediabias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_sentiment_classification_pipeline_en_5.5.0_3.0_1727036931274.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_sentiment_classification_pipeline_en_5.5.0_3.0_1727036931274.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_sentiment_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_sentiment_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_sentiment_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|451.0 MB| + +## References + +https://huggingface.co/newsmediabias/Roberta_Sentiment_Classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_tgmd_large_b_es1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_tgmd_large_b_es1_pipeline_en.md new file mode 100644 index 00000000000000..b79e8dd36e0a17 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_tgmd_large_b_es1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_tgmd_large_b_es1_pipeline pipeline RoBertaForSequenceClassification from hadish +author: John Snow Labs +name: roberta_tgmd_large_b_es1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tgmd_large_b_es1_pipeline` is a English model originally trained by hadish. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tgmd_large_b_es1_pipeline_en_5.5.0_3.0_1726971838898.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tgmd_large_b_es1_pipeline_en_5.5.0_3.0_1726971838898.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_tgmd_large_b_es1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_tgmd_large_b_es1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tgmd_large_b_es1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/hadish/roberta-TGMD-large-B-ES1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-robertalex_ptbr_ulyssesner_en.md b/docs/_posts/ahmedlone127/2024-09-22-robertalex_ptbr_ulyssesner_en.md new file mode 100644 index 00000000000000..1f8d0222d54486 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-robertalex_ptbr_ulyssesner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robertalex_ptbr_ulyssesner RoBertaForTokenClassification from giliardgodoi +author: John Snow Labs +name: robertalex_ptbr_ulyssesner +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertalex_ptbr_ulyssesner` is a English model originally trained by giliardgodoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertalex_ptbr_ulyssesner_en_5.5.0_3.0_1727048953872.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertalex_ptbr_ulyssesner_en_5.5.0_3.0_1727048953872.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("robertalex_ptbr_ulyssesner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("robertalex_ptbr_ulyssesner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertalex_ptbr_ulyssesner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|434.5 MB| + +## References + +https://huggingface.co/giliardgodoi/robertalex-ptbr-ulyssesner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-robertalex_ptbr_ulyssesner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-robertalex_ptbr_ulyssesner_pipeline_en.md new file mode 100644 index 00000000000000..8c7bf6d77fe8f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-robertalex_ptbr_ulyssesner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robertalex_ptbr_ulyssesner_pipeline pipeline RoBertaForTokenClassification from giliardgodoi +author: John Snow Labs +name: robertalex_ptbr_ulyssesner_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertalex_ptbr_ulyssesner_pipeline` is a English model originally trained by giliardgodoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertalex_ptbr_ulyssesner_pipeline_en_5.5.0_3.0_1727048978989.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertalex_ptbr_ulyssesner_pipeline_en_5.5.0_3.0_1727048978989.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robertalex_ptbr_ulyssesner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robertalex_ptbr_ulyssesner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertalex_ptbr_ulyssesner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|434.5 MB| + +## References + +https://huggingface.co/giliardgodoi/robertalex-ptbr-ulyssesner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-romansh_prima_en.md b/docs/_posts/ahmedlone127/2024-09-22-romansh_prima_en.md new file mode 100644 index 00000000000000..bc4a561889ab06 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-romansh_prima_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English romansh_prima RoBertaForSequenceClassification from davidgaofc +author: John Snow Labs +name: romansh_prima +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`romansh_prima` is a English model originally trained by davidgaofc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/romansh_prima_en_5.5.0_3.0_1727037129621.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/romansh_prima_en_5.5.0_3.0_1727037129621.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("romansh_prima","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("romansh_prima", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|romansh_prima| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/davidgaofc/RM_prima \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-rubert_finetuned_ner_dondosss_en.md b/docs/_posts/ahmedlone127/2024-09-22-rubert_finetuned_ner_dondosss_en.md new file mode 100644 index 00000000000000..022f7a32f09149 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-rubert_finetuned_ner_dondosss_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English rubert_finetuned_ner_dondosss BertForTokenClassification from dondosss +author: John Snow Labs +name: rubert_finetuned_ner_dondosss +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_finetuned_ner_dondosss` is a English model originally trained by dondosss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_finetuned_ner_dondosss_en_5.5.0_3.0_1726977944951.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_finetuned_ner_dondosss_en_5.5.0_3.0_1726977944951.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("rubert_finetuned_ner_dondosss","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("rubert_finetuned_ner_dondosss", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_finetuned_ner_dondosss| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|664.3 MB| + +## References + +https://huggingface.co/dondosss/rubert-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-rubert_finetuned_ner_dondosss_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-rubert_finetuned_ner_dondosss_pipeline_en.md new file mode 100644 index 00000000000000..916dd7dd6e5a14 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-rubert_finetuned_ner_dondosss_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English rubert_finetuned_ner_dondosss_pipeline pipeline BertForTokenClassification from dondosss +author: John Snow Labs +name: rubert_finetuned_ner_dondosss_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_finetuned_ner_dondosss_pipeline` is a English model originally trained by dondosss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_finetuned_ner_dondosss_pipeline_en_5.5.0_3.0_1726977973677.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_finetuned_ner_dondosss_pipeline_en_5.5.0_3.0_1726977973677.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rubert_finetuned_ner_dondosss_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rubert_finetuned_ner_dondosss_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_finetuned_ner_dondosss_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|664.3 MB| + +## References + +https://huggingface.co/dondosss/rubert-finetuned-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-scenario_non_kd_po_copy_cdf_english_d2_data_english_cardiff_eng_only_beta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-scenario_non_kd_po_copy_cdf_english_d2_data_english_cardiff_eng_only_beta_pipeline_en.md new file mode 100644 index 00000000000000..81780bc6c6de8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-scenario_non_kd_po_copy_cdf_english_d2_data_english_cardiff_eng_only_beta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English scenario_non_kd_po_copy_cdf_english_d2_data_english_cardiff_eng_only_beta_pipeline pipeline XlmRoBertaForSequenceClassification from haryoaw +author: John Snow Labs +name: scenario_non_kd_po_copy_cdf_english_d2_data_english_cardiff_eng_only_beta_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scenario_non_kd_po_copy_cdf_english_d2_data_english_cardiff_eng_only_beta_pipeline` is a English model originally trained by haryoaw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scenario_non_kd_po_copy_cdf_english_d2_data_english_cardiff_eng_only_beta_pipeline_en_5.5.0_3.0_1727009653857.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scenario_non_kd_po_copy_cdf_english_d2_data_english_cardiff_eng_only_beta_pipeline_en_5.5.0_3.0_1727009653857.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("scenario_non_kd_po_copy_cdf_english_d2_data_english_cardiff_eng_only_beta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("scenario_non_kd_po_copy_cdf_english_d2_data_english_cardiff_eng_only_beta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scenario_non_kd_po_copy_cdf_english_d2_data_english_cardiff_eng_only_beta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|672.4 MB| + +## References + +https://huggingface.co/haryoaw/scenario-NON-KD-PO-COPY-CDF-EN-D2_data-en-cardiff_eng_only_beta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_arabertmo_base_v6_ar.md b/docs/_posts/ahmedlone127/2024-09-22-sent_arabertmo_base_v6_ar.md new file mode 100644 index 00000000000000..8664848875c7b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_arabertmo_base_v6_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic sent_arabertmo_base_v6 BertSentenceEmbeddings from Ebtihal +author: John Snow Labs +name: sent_arabertmo_base_v6 +date: 2024-09-22 +tags: [ar, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_arabertmo_base_v6` is a Arabic model originally trained by Ebtihal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v6_ar_5.5.0_3.0_1727044755571.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v6_ar_5.5.0_3.0_1727044755571.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_arabertmo_base_v6","ar") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_arabertmo_base_v6","ar") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_arabertmo_base_v6| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ar| +|Size:|407.9 MB| + +## References + +https://huggingface.co/Ebtihal/AraBertMo_base_V6 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_arabertmo_base_v6_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-22-sent_arabertmo_base_v6_pipeline_ar.md new file mode 100644 index 00000000000000..67b1f407965509 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_arabertmo_base_v6_pipeline_ar.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Arabic sent_arabertmo_base_v6_pipeline pipeline BertSentenceEmbeddings from Ebtihal +author: John Snow Labs +name: sent_arabertmo_base_v6_pipeline +date: 2024-09-22 +tags: [ar, open_source, pipeline, onnx] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_arabertmo_base_v6_pipeline` is a Arabic model originally trained by Ebtihal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v6_pipeline_ar_5.5.0_3.0_1727044776904.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v6_pipeline_ar_5.5.0_3.0_1727044776904.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_arabertmo_base_v6_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_arabertmo_base_v6_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_arabertmo_base_v6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|408.4 MB| + +## References + +https://huggingface.co/Ebtihal/AraBertMo_base_V6 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_banglabert_generator_bn.md b/docs/_posts/ahmedlone127/2024-09-22-sent_banglabert_generator_bn.md new file mode 100644 index 00000000000000..04a64d133664a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_banglabert_generator_bn.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Bengali sent_banglabert_generator BertSentenceEmbeddings from csebuetnlp +author: John Snow Labs +name: sent_banglabert_generator +date: 2024-09-22 +tags: [bn, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_banglabert_generator` is a Bengali model originally trained by csebuetnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_banglabert_generator_bn_5.5.0_3.0_1727004429749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_banglabert_generator_bn_5.5.0_3.0_1727004429749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_banglabert_generator","bn") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_banglabert_generator","bn") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_banglabert_generator| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|bn| +|Size:|130.0 MB| + +## References + +https://huggingface.co/csebuetnlp/banglabert_generator \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_banglabert_generator_pipeline_bn.md b/docs/_posts/ahmedlone127/2024-09-22-sent_banglabert_generator_pipeline_bn.md new file mode 100644 index 00000000000000..db7e4edae30845 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_banglabert_generator_pipeline_bn.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Bengali sent_banglabert_generator_pipeline pipeline BertSentenceEmbeddings from csebuetnlp +author: John Snow Labs +name: sent_banglabert_generator_pipeline +date: 2024-09-22 +tags: [bn, open_source, pipeline, onnx] +task: Embeddings +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_banglabert_generator_pipeline` is a Bengali model originally trained by csebuetnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_banglabert_generator_pipeline_bn_5.5.0_3.0_1727004435762.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_banglabert_generator_pipeline_bn_5.5.0_3.0_1727004435762.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_banglabert_generator_pipeline", lang = "bn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_banglabert_generator_pipeline", lang = "bn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_banglabert_generator_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bn| +|Size:|130.5 MB| + +## References + +https://huggingface.co/csebuetnlp/banglabert_generator + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_cased_model_attribution_challenge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_cased_model_attribution_challenge_pipeline_en.md new file mode 100644 index 00000000000000..b521740b6cebdc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_cased_model_attribution_challenge_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_cased_model_attribution_challenge_pipeline pipeline BertSentenceEmbeddings from model-attribution-challenge +author: John Snow Labs +name: sent_bert_base_cased_model_attribution_challenge_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_cased_model_attribution_challenge_pipeline` is a English model originally trained by model-attribution-challenge. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_model_attribution_challenge_pipeline_en_5.5.0_3.0_1727004112566.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_model_attribution_challenge_pipeline_en_5.5.0_3.0_1727004112566.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_cased_model_attribution_challenge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_cased_model_attribution_challenge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_cased_model_attribution_challenge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|404.2 MB| + +## References + +https://huggingface.co/model-attribution-challenge/bert-base-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_danish_cased_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_danish_cased_en.md new file mode 100644 index 00000000000000..2f3708719730ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_danish_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_danish_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_danish_cased +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_danish_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_danish_cased_en_5.5.0_3.0_1727044582855.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_danish_cased_en_5.5.0_3.0_1727044582855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_danish_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_danish_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_danish_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|414.6 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-da-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_danish_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_danish_cased_pipeline_en.md new file mode 100644 index 00000000000000..47515528d66cd1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_danish_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_danish_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_danish_cased_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_danish_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_danish_cased_pipeline_en_5.5.0_3.0_1727044604106.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_danish_cased_pipeline_en_5.5.0_3.0_1727044604106.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_danish_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_danish_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_danish_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|415.1 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-da-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_italian_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_italian_cased_pipeline_en.md new file mode 100644 index 00000000000000..7529d98ac19cd1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_italian_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_italian_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_italian_cased_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_italian_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_italian_cased_pipeline_en_5.5.0_3.0_1727047201097.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_italian_cased_pipeline_en_5.5.0_3.0_1727047201097.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_italian_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_italian_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_italian_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|418.5 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-it-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_german_cased_finetuned_swiss_de.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_german_cased_finetuned_swiss_de.md new file mode 100644 index 00000000000000..4d024e245f4387 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_german_cased_finetuned_swiss_de.md @@ -0,0 +1,94 @@ +--- +layout: model +title: German sent_bert_base_german_cased_finetuned_swiss BertSentenceEmbeddings from statworx +author: John Snow Labs +name: sent_bert_base_german_cased_finetuned_swiss +date: 2024-09-22 +tags: [de, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_german_cased_finetuned_swiss` is a German model originally trained by statworx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_german_cased_finetuned_swiss_de_5.5.0_3.0_1727047043482.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_german_cased_finetuned_swiss_de_5.5.0_3.0_1727047043482.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_german_cased_finetuned_swiss","de") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_german_cased_finetuned_swiss","de") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_german_cased_finetuned_swiss| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|de| +|Size:|406.9 MB| + +## References + +https://huggingface.co/statworx/bert-base-german-cased-finetuned-swiss \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_german_europeana_td_cased_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_german_europeana_td_cased_en.md new file mode 100644 index 00000000000000..f60dee84235952 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_german_europeana_td_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_german_europeana_td_cased BertSentenceEmbeddings from dbmdz +author: John Snow Labs +name: sent_bert_base_german_europeana_td_cased +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_german_europeana_td_cased` is a English model originally trained by dbmdz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_german_europeana_td_cased_en_5.5.0_3.0_1727013463916.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_german_europeana_td_cased_en_5.5.0_3.0_1727013463916.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_german_europeana_td_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_german_europeana_td_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_german_europeana_td_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|412.4 MB| + +## References + +https://huggingface.co/dbmdz/bert-base-german-europeana-td-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_german_europeana_td_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_german_europeana_td_cased_pipeline_en.md new file mode 100644 index 00000000000000..5dd134da0afeb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_german_europeana_td_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_german_europeana_td_cased_pipeline pipeline BertSentenceEmbeddings from dbmdz +author: John Snow Labs +name: sent_bert_base_german_europeana_td_cased_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_german_europeana_td_cased_pipeline` is a English model originally trained by dbmdz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_german_europeana_td_cased_pipeline_en_5.5.0_3.0_1727013483644.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_german_europeana_td_cased_pipeline_en_5.5.0_3.0_1727013483644.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_german_europeana_td_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_german_europeana_td_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_german_europeana_td_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|412.9 MB| + +## References + +https://huggingface.co/dbmdz/bert-base-german-europeana-td-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_multilingual_cased_based_encoder_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_multilingual_cased_based_encoder_pipeline_xx.md new file mode 100644 index 00000000000000..cb7d73d54242ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_multilingual_cased_based_encoder_pipeline_xx.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Multilingual sent_bert_base_multilingual_cased_based_encoder_pipeline pipeline BertSentenceEmbeddings from shsha0110 +author: John Snow Labs +name: sent_bert_base_multilingual_cased_based_encoder_pipeline +date: 2024-09-22 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_multilingual_cased_based_encoder_pipeline` is a Multilingual model originally trained by shsha0110. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_based_encoder_pipeline_xx_5.5.0_3.0_1727001937322.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_based_encoder_pipeline_xx_5.5.0_3.0_1727001937322.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_multilingual_cased_based_encoder_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_multilingual_cased_based_encoder_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_multilingual_cased_based_encoder_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|665.4 MB| + +## References + +https://huggingface.co/shsha0110/bert-base-multilingual-cased-based-encoder + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_1802_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_1802_en.md new file mode 100644 index 00000000000000..fcb76cb3520a72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_1802_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_1802 BertSentenceEmbeddings from JamesKim +author: John Snow Labs +name: sent_bert_base_uncased_1802 +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_1802` is a English model originally trained by JamesKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_1802_en_5.5.0_3.0_1727046935475.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_1802_en_5.5.0_3.0_1727046935475.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_1802","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_1802","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_1802| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/JamesKim/bert-base-uncased_1802 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_1802_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_1802_pipeline_en.md new file mode 100644 index 00000000000000..957efba71159c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_1802_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_1802_pipeline pipeline BertSentenceEmbeddings from JamesKim +author: John Snow Labs +name: sent_bert_base_uncased_1802_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_1802_pipeline` is a English model originally trained by JamesKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_1802_pipeline_en_5.5.0_3.0_1727046955301.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_1802_pipeline_en_5.5.0_3.0_1727046955301.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_1802_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_1802_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_1802_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/JamesKim/bert-base-uncased_1802 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_dstc9_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_dstc9_pipeline_en.md new file mode 100644 index 00000000000000..eaf1236e33a07c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_dstc9_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_dstc9_pipeline pipeline BertSentenceEmbeddings from wilsontam +author: John Snow Labs +name: sent_bert_base_uncased_dstc9_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_dstc9_pipeline` is a English model originally trained by wilsontam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_dstc9_pipeline_en_5.5.0_3.0_1727004514773.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_dstc9_pipeline_en_5.5.0_3.0_1727004514773.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_dstc9_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_dstc9_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_dstc9_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/wilsontam/bert-base-uncased-dstc9 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_hndc_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_hndc_en.md new file mode 100644 index 00000000000000..dc6b8cfaf1fb72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_hndc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_hndc BertSentenceEmbeddings from hndc +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_hndc +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_hndc` is a English model originally trained by hndc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_hndc_en_5.5.0_3.0_1727013706027.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_hndc_en_5.5.0_3.0_1727013706027.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_hndc","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_hndc","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_hndc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/hndc/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_hndc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_hndc_pipeline_en.md new file mode 100644 index 00000000000000..118566c4a9b290 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_hndc_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_hndc_pipeline pipeline BertSentenceEmbeddings from hndc +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_hndc_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_hndc_pipeline` is a English model originally trained by hndc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_hndc_pipeline_en_5.5.0_3.0_1727013726241.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_hndc_pipeline_en_5.5.0_3.0_1727013726241.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_hndc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_hndc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_hndc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/hndc/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_transformersbook_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_transformersbook_pipeline_en.md new file mode 100644 index 00000000000000..71c67c9420867a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_issues_128_transformersbook_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_transformersbook_pipeline pipeline BertSentenceEmbeddings from transformersbook +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_transformersbook_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_transformersbook_pipeline` is a English model originally trained by transformersbook. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_transformersbook_pipeline_en_5.5.0_3.0_1727013642163.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_transformersbook_pipeline_en_5.5.0_3.0_1727013642163.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_transformersbook_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_transformersbook_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_transformersbook_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/transformersbook/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_l10_h256_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_l10_h256_uncased_en.md new file mode 100644 index 00000000000000..0b960e6865657e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_l10_h256_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_l10_h256_uncased BertSentenceEmbeddings from gaunernst +author: John Snow Labs +name: sent_bert_l10_h256_uncased +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_l10_h256_uncased` is a English model originally trained by gaunernst. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_l10_h256_uncased_en_5.5.0_3.0_1727004245964.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_l10_h256_uncased_en_5.5.0_3.0_1727004245964.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_l10_h256_uncased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_l10_h256_uncased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_l10_h256_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|59.6 MB| + +## References + +https://huggingface.co/gaunernst/bert-L10-H256-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_small_cord19_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_small_cord19_en.md new file mode 100644 index 00000000000000..4f627293a04bd3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_small_cord19_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_small_cord19 BertSentenceEmbeddings from NeuML +author: John Snow Labs +name: sent_bert_small_cord19 +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_small_cord19` is a English model originally trained by NeuML. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_small_cord19_en_5.5.0_3.0_1727046962818.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_small_cord19_en_5.5.0_3.0_1727046962818.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_small_cord19","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_small_cord19","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_small_cord19| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|130.6 MB| + +## References + +https://huggingface.co/NeuML/bert-small-cord19 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_small_cord19_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_small_cord19_pipeline_en.md new file mode 100644 index 00000000000000..f4eb9e1b1b9d70 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_small_cord19_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_small_cord19_pipeline pipeline BertSentenceEmbeddings from NeuML +author: John Snow Labs +name: sent_bert_small_cord19_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_small_cord19_pipeline` is a English model originally trained by NeuML. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_small_cord19_pipeline_en_5.5.0_3.0_1727046968827.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_small_cord19_pipeline_en_5.5.0_3.0_1727046968827.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_small_cord19_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_small_cord19_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_small_cord19_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|131.1 MB| + +## References + +https://huggingface.co/NeuML/bert-small-cord19 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_two_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_two_en.md new file mode 100644 index 00000000000000..d5deaf6cfe92af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_two_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_two BertSentenceEmbeddings from emma7897 +author: John Snow Labs +name: sent_bert_two +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_two` is a English model originally trained by emma7897. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_two_en_5.5.0_3.0_1727044311067.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_two_en_5.5.0_3.0_1727044311067.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_two","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_two","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_two| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/emma7897/bert_two \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_two_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_two_pipeline_en.md new file mode 100644 index 00000000000000..84d7ac1d97978b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_two_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_two_pipeline pipeline BertSentenceEmbeddings from emma7897 +author: John Snow Labs +name: sent_bert_two_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_two_pipeline` is a English model originally trained by emma7897. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_two_pipeline_en_5.5.0_3.0_1727044331597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_two_pipeline_en_5.5.0_3.0_1727044331597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_two_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_two_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_two_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|404.2 MB| + +## References + +https://huggingface.co/emma7897/bert_two + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bertbek_news_big_cased_pipeline_uz.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bertbek_news_big_cased_pipeline_uz.md new file mode 100644 index 00000000000000..916dd9201cd015 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bertbek_news_big_cased_pipeline_uz.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Uzbek sent_bertbek_news_big_cased_pipeline pipeline BertSentenceEmbeddings from elmurod1202 +author: John Snow Labs +name: sent_bertbek_news_big_cased_pipeline +date: 2024-09-22 +tags: [uz, open_source, pipeline, onnx] +task: Embeddings +language: uz +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertbek_news_big_cased_pipeline` is a Uzbek model originally trained by elmurod1202. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertbek_news_big_cased_pipeline_uz_5.5.0_3.0_1727047021905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertbek_news_big_cased_pipeline_uz_5.5.0_3.0_1727047021905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bertbek_news_big_cased_pipeline", lang = "uz") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bertbek_news_big_cased_pipeline", lang = "uz") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertbek_news_big_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|uz| +|Size:|406.0 MB| + +## References + +https://huggingface.co/elmurod1202/bertbek-news-big-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bertbek_news_big_cased_uz.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bertbek_news_big_cased_uz.md new file mode 100644 index 00000000000000..4f2b4916bf8bc7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bertbek_news_big_cased_uz.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Uzbek sent_bertbek_news_big_cased BertSentenceEmbeddings from elmurod1202 +author: John Snow Labs +name: sent_bertbek_news_big_cased +date: 2024-09-22 +tags: [uz, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: uz +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertbek_news_big_cased` is a Uzbek model originally trained by elmurod1202. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertbek_news_big_cased_uz_5.5.0_3.0_1727047001984.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertbek_news_big_cased_uz_5.5.0_3.0_1727047001984.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bertbek_news_big_cased","uz") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bertbek_news_big_cased","uz") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertbek_news_big_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|uz| +|Size:|405.5 MB| + +## References + +https://huggingface.co/elmurod1202/bertbek-news-big-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bertimbau_base_finetuned_lener_breton_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bertimbau_base_finetuned_lener_breton_pipeline_pt.md new file mode 100644 index 00000000000000..f0b7cb5046ded9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bertimbau_base_finetuned_lener_breton_pipeline_pt.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Portuguese sent_bertimbau_base_finetuned_lener_breton_pipeline pipeline BertSentenceEmbeddings from Luciano +author: John Snow Labs +name: sent_bertimbau_base_finetuned_lener_breton_pipeline +date: 2024-09-22 +tags: [pt, open_source, pipeline, onnx] +task: Embeddings +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertimbau_base_finetuned_lener_breton_pipeline` is a Portuguese model originally trained by Luciano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertimbau_base_finetuned_lener_breton_pipeline_pt_5.5.0_3.0_1727013445081.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertimbau_base_finetuned_lener_breton_pipeline_pt_5.5.0_3.0_1727013445081.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bertimbau_base_finetuned_lener_breton_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bertimbau_base_finetuned_lener_breton_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertimbau_base_finetuned_lener_breton_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|406.4 MB| + +## References + +https://huggingface.co/Luciano/bertimbau-base-finetuned-lener-br + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bertimbau_base_finetuned_lener_breton_pt.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bertimbau_base_finetuned_lener_breton_pt.md new file mode 100644 index 00000000000000..3ecd0b1f5e29cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bertimbau_base_finetuned_lener_breton_pt.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Portuguese sent_bertimbau_base_finetuned_lener_breton BertSentenceEmbeddings from Luciano +author: John Snow Labs +name: sent_bertimbau_base_finetuned_lener_breton +date: 2024-09-22 +tags: [pt, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertimbau_base_finetuned_lener_breton` is a Portuguese model originally trained by Luciano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertimbau_base_finetuned_lener_breton_pt_5.5.0_3.0_1727013425342.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertimbau_base_finetuned_lener_breton_pt_5.5.0_3.0_1727013425342.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bertimbau_base_finetuned_lener_breton","pt") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bertimbau_base_finetuned_lener_breton","pt") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertimbau_base_finetuned_lener_breton| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|pt| +|Size:|405.9 MB| + +## References + +https://huggingface.co/Luciano/bertimbau-base-finetuned-lener-br \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_beto_clinical_wl_spanish_es.md b/docs/_posts/ahmedlone127/2024-09-22-sent_beto_clinical_wl_spanish_es.md new file mode 100644 index 00000000000000..1d54989658055b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_beto_clinical_wl_spanish_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish sent_beto_clinical_wl_spanish BertSentenceEmbeddings from plncmm +author: John Snow Labs +name: sent_beto_clinical_wl_spanish +date: 2024-09-22 +tags: [es, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_beto_clinical_wl_spanish` is a Castilian, Spanish model originally trained by plncmm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_beto_clinical_wl_spanish_es_5.5.0_3.0_1727044736651.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_beto_clinical_wl_spanish_es_5.5.0_3.0_1727044736651.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_beto_clinical_wl_spanish","es") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_beto_clinical_wl_spanish","es") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_beto_clinical_wl_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|es| +|Size:|409.7 MB| + +## References + +https://huggingface.co/plncmm/beto-clinical-wl-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_beto_clinical_wl_spanish_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-22-sent_beto_clinical_wl_spanish_pipeline_es.md new file mode 100644 index 00000000000000..406f0768ac0384 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_beto_clinical_wl_spanish_pipeline_es.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Castilian, Spanish sent_beto_clinical_wl_spanish_pipeline pipeline BertSentenceEmbeddings from plncmm +author: John Snow Labs +name: sent_beto_clinical_wl_spanish_pipeline +date: 2024-09-22 +tags: [es, open_source, pipeline, onnx] +task: Embeddings +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_beto_clinical_wl_spanish_pipeline` is a Castilian, Spanish model originally trained by plncmm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_beto_clinical_wl_spanish_pipeline_es_5.5.0_3.0_1727044757989.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_beto_clinical_wl_spanish_pipeline_es_5.5.0_3.0_1727044757989.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_beto_clinical_wl_spanish_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_beto_clinical_wl_spanish_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_beto_clinical_wl_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|410.2 MB| + +## References + +https://huggingface.co/plncmm/beto-clinical-wl-es + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bibert_v0_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bibert_v0_1_pipeline_en.md new file mode 100644 index 00000000000000..5b32125b66562f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bibert_v0_1_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bibert_v0_1_pipeline pipeline BertSentenceEmbeddings from yugen-ok +author: John Snow Labs +name: sent_bibert_v0_1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bibert_v0_1_pipeline` is a English model originally trained by yugen-ok. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bibert_v0_1_pipeline_en_5.5.0_3.0_1727044478271.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bibert_v0_1_pipeline_en_5.5.0_3.0_1727044478271.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bibert_v0_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bibert_v0_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bibert_v0_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/yugen-ok/bibert-v0.1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bioptimus_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bioptimus_en.md new file mode 100644 index 00000000000000..96e4d099f274e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bioptimus_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bioptimus BertSentenceEmbeddings from rttl-ai +author: John Snow Labs +name: sent_bioptimus +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bioptimus` is a English model originally trained by rttl-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bioptimus_en_5.5.0_3.0_1727047066143.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bioptimus_en_5.5.0_3.0_1727047066143.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bioptimus","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bioptimus","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bioptimus| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|408.1 MB| + +## References + +https://huggingface.co/rttl-ai/BIOptimus \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_incel_alberto_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_incel_alberto_en.md new file mode 100644 index 00000000000000..f12ff35693d2be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_incel_alberto_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_incel_alberto BertSentenceEmbeddings from pgajo +author: John Snow Labs +name: sent_incel_alberto +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_incel_alberto` is a English model originally trained by pgajo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_incel_alberto_en_5.5.0_3.0_1727044458578.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_incel_alberto_en_5.5.0_3.0_1727044458578.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_incel_alberto","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_incel_alberto","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_incel_alberto| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|688.7 MB| + +## References + +https://huggingface.co/pgajo/incel-alberto \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_incel_alberto_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_incel_alberto_pipeline_en.md new file mode 100644 index 00000000000000..01fd9c274bf04f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_incel_alberto_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_incel_alberto_pipeline pipeline BertSentenceEmbeddings from pgajo +author: John Snow Labs +name: sent_incel_alberto_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_incel_alberto_pipeline` is a English model originally trained by pgajo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_incel_alberto_pipeline_en_5.5.0_3.0_1727044495226.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_incel_alberto_pipeline_en_5.5.0_3.0_1727044495226.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_incel_alberto_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_incel_alberto_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_incel_alberto_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|689.3 MB| + +## References + +https://huggingface.co/pgajo/incel-alberto + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_indobert_base_p2_finetuned_mer_80k_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_indobert_base_p2_finetuned_mer_80k_en.md new file mode 100644 index 00000000000000..4850bb1194e7da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_indobert_base_p2_finetuned_mer_80k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_indobert_base_p2_finetuned_mer_80k BertSentenceEmbeddings from stevenwh +author: John Snow Labs +name: sent_indobert_base_p2_finetuned_mer_80k +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_indobert_base_p2_finetuned_mer_80k` is a English model originally trained by stevenwh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_indobert_base_p2_finetuned_mer_80k_en_5.5.0_3.0_1727001702109.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_indobert_base_p2_finetuned_mer_80k_en_5.5.0_3.0_1727001702109.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_indobert_base_p2_finetuned_mer_80k","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_indobert_base_p2_finetuned_mer_80k","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_indobert_base_p2_finetuned_mer_80k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|464.2 MB| + +## References + +https://huggingface.co/stevenwh/indobert-base-p2-finetuned-mer-80k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_knowbias_bert_base_uncased_race_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_knowbias_bert_base_uncased_race_en.md new file mode 100644 index 00000000000000..47f0b2e36e6f51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_knowbias_bert_base_uncased_race_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_knowbias_bert_base_uncased_race BertSentenceEmbeddings from squiduu +author: John Snow Labs +name: sent_knowbias_bert_base_uncased_race +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_knowbias_bert_base_uncased_race` is a English model originally trained by squiduu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_knowbias_bert_base_uncased_race_en_5.5.0_3.0_1727013798024.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_knowbias_bert_base_uncased_race_en_5.5.0_3.0_1727013798024.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_knowbias_bert_base_uncased_race","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_knowbias_bert_base_uncased_race","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_knowbias_bert_base_uncased_race| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/squiduu/knowbias-bert-base-uncased-race \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_knowbias_bert_base_uncased_race_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_knowbias_bert_base_uncased_race_pipeline_en.md new file mode 100644 index 00000000000000..04b6ed37cfcd8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_knowbias_bert_base_uncased_race_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_knowbias_bert_base_uncased_race_pipeline pipeline BertSentenceEmbeddings from squiduu +author: John Snow Labs +name: sent_knowbias_bert_base_uncased_race_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_knowbias_bert_base_uncased_race_pipeline` is a English model originally trained by squiduu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_knowbias_bert_base_uncased_race_pipeline_en_5.5.0_3.0_1727013817401.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_knowbias_bert_base_uncased_race_pipeline_en_5.5.0_3.0_1727013817401.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_knowbias_bert_base_uncased_race_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_knowbias_bert_base_uncased_race_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_knowbias_bert_base_uncased_race_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/squiduu/knowbias-bert-base-uncased-race + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_python_code_comment_classification_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_python_code_comment_classification_en.md new file mode 100644 index 00000000000000..2ef2d3f55b1dc3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_python_code_comment_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_python_code_comment_classification BertSentenceEmbeddings from ZarahShibli +author: John Snow Labs +name: sent_python_code_comment_classification +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_python_code_comment_classification` is a English model originally trained by ZarahShibli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_python_code_comment_classification_en_5.5.0_3.0_1727047423894.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_python_code_comment_classification_en_5.5.0_3.0_1727047423894.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_python_code_comment_classification","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_python_code_comment_classification","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_python_code_comment_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|406.7 MB| + +## References + +https://huggingface.co/ZarahShibli/python-code-comment-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_python_code_comment_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_python_code_comment_classification_pipeline_en.md new file mode 100644 index 00000000000000..fb08de1298413e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_python_code_comment_classification_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_python_code_comment_classification_pipeline pipeline BertSentenceEmbeddings from ZarahShibli +author: John Snow Labs +name: sent_python_code_comment_classification_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_python_code_comment_classification_pipeline` is a English model originally trained by ZarahShibli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_python_code_comment_classification_pipeline_en_5.5.0_3.0_1727047443516.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_python_code_comment_classification_pipeline_en_5.5.0_3.0_1727047443516.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_python_code_comment_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_python_code_comment_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_python_code_comment_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ZarahShibli/python-code-comment-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_splade_v3_lexical_nirantk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_splade_v3_lexical_nirantk_pipeline_en.md new file mode 100644 index 00000000000000..8df612c5099bcd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_splade_v3_lexical_nirantk_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_splade_v3_lexical_nirantk_pipeline pipeline BertSentenceEmbeddings from nirantk +author: John Snow Labs +name: sent_splade_v3_lexical_nirantk_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_splade_v3_lexical_nirantk_pipeline` is a English model originally trained by nirantk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_splade_v3_lexical_nirantk_pipeline_en_5.5.0_3.0_1727001497745.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_splade_v3_lexical_nirantk_pipeline_en_5.5.0_3.0_1727001497745.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_splade_v3_lexical_nirantk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_splade_v3_lexical_nirantk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_splade_v3_lexical_nirantk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.9 MB| + +## References + +https://huggingface.co/nirantk/splade-v3-lexical + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentence_classification_bitnet_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentence_classification_bitnet_pipeline_en.md new file mode 100644 index 00000000000000..1623d68d2fc3cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentence_classification_bitnet_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentence_classification_bitnet_pipeline pipeline DistilBertForSequenceClassification from sanjeev-bhandari01 +author: John Snow Labs +name: sentence_classification_bitnet_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentence_classification_bitnet_pipeline` is a English model originally trained by sanjeev-bhandari01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentence_classification_bitnet_pipeline_en_5.5.0_3.0_1726980426598.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentence_classification_bitnet_pipeline_en_5.5.0_3.0_1726980426598.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentence_classification_bitnet_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentence_classification_bitnet_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentence_classification_bitnet_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|248.5 MB| + +## References + +https://huggingface.co/sanjeev-bhandari01/sentence_classification_bitnet + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment1_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment1_en.md new file mode 100644 index 00000000000000..829dd2a33a7b9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment1 DistilBertForSequenceClassification from ben-ongys +author: John Snow Labs +name: sentiment1 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment1` is a English model originally trained by ben-ongys. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment1_en_5.5.0_3.0_1727035375346.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment1_en_5.5.0_3.0_1727035375346.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ben-ongys/sentiment1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment1_pipeline_en.md new file mode 100644 index 00000000000000..dae71b88ddfa03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment1_pipeline pipeline DistilBertForSequenceClassification from ben-ongys +author: John Snow Labs +name: sentiment1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment1_pipeline` is a English model originally trained by ben-ongys. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment1_pipeline_en_5.5.0_3.0_1727035389325.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment1_pipeline_en_5.5.0_3.0_1727035389325.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ben-ongys/sentiment1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..e32d4bd6f9f625 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_distilbert_pipeline pipeline DistilBertForSequenceClassification from arnavmahapatra +author: John Snow Labs +name: sentiment_analysis_distilbert_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_distilbert_pipeline` is a English model originally trained by arnavmahapatra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_distilbert_pipeline_en_5.5.0_3.0_1726980239220.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_distilbert_pipeline_en_5.5.0_3.0_1726980239220.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/arnavmahapatra/sentiment-analysis-distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_llm_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_llm_en.md new file mode 100644 index 00000000000000..556f9c80bf3cb1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_llm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_llm RoBertaForSequenceClassification from Agra2002 +author: John Snow Labs +name: sentiment_analysis_llm +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_llm` is a English model originally trained by Agra2002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_llm_en_5.5.0_3.0_1726972503397.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_llm_en_5.5.0_3.0_1726972503397.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_analysis_llm","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_analysis_llm", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_llm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|467.7 MB| + +## References + +https://huggingface.co/Agra2002/sentiment_analysis_LLM \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_llm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_llm_pipeline_en.md new file mode 100644 index 00000000000000..beec11fa196a8c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_llm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_llm_pipeline pipeline RoBertaForSequenceClassification from Agra2002 +author: John Snow Labs +name: sentiment_analysis_llm_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_llm_pipeline` is a English model originally trained by Agra2002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_llm_pipeline_en_5.5.0_3.0_1726972524131.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_llm_pipeline_en_5.5.0_3.0_1726972524131.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_llm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_llm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_llm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|467.7 MB| + +## References + +https://huggingface.co/Agra2002/sentiment_analysis_LLM + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_shadez25_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_shadez25_pipeline_en.md new file mode 100644 index 00000000000000..d274ee64ced64e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_shadez25_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_shadez25_pipeline pipeline DistilBertForSequenceClassification from Shadez25 +author: John Snow Labs +name: sentiment_analysis_shadez25_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_shadez25_pipeline` is a English model originally trained by Shadez25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_shadez25_pipeline_en_5.5.0_3.0_1727033506405.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_shadez25_pipeline_en_5.5.0_3.0_1727033506405.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_shadez25_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_shadez25_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_shadez25_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Shadez25/sentiment-analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_pipeline_en.md new file mode 100644 index 00000000000000..ab0cb502546e1f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_pipeline_en_5.5.0_3.0_1727026444053.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_pipeline_en_5.5.0_3.0_1727026444053.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random0_seed2_twitter_roberta_base_dec2020_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random0_seed2-twitter-roberta-base-dec2020 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_en.md new file mode 100644 index 00000000000000..d8f2cf363a8735 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_en_5.5.0_3.0_1727037162690.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_en_5.5.0_3.0_1727037162690.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random1_seed1-twitter-roberta-base-2019-90m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_pipeline_en.md new file mode 100644 index 00000000000000..c5f1b3b9a39b29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_pipeline_en_5.5.0_3.0_1727037186523.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_pipeline_en_5.5.0_3.0_1727037186523.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random1_seed1_twitter_roberta_base_2019_90m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random1_seed1-twitter-roberta-base-2019-90m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed0_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed0_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..5c5e5ad3618c74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_random3_seed0_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_sentiment_small_random3_seed0_roberta_base_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random3_seed0_roberta_base_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random3_seed0_roberta_base_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_roberta_base_pipeline_en_5.5.0_3.0_1727026620010.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_roberta_base_pipeline_en_5.5.0_3.0_1727026620010.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_sentiment_small_random3_seed0_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_sentiment_small_random3_seed0_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random3_seed0_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|430.5 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random3_seed0-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_temporal_twitter_roberta_base_2022_154m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_temporal_twitter_roberta_base_2022_154m_pipeline_en.md new file mode 100644 index 00000000000000..c385fdf4a48bd5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_sentiment_small_temporal_twitter_roberta_base_2022_154m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_sentiment_small_temporal_twitter_roberta_base_2022_154m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_temporal_twitter_roberta_base_2022_154m_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_temporal_twitter_roberta_base_2022_154m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_temporal_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1727017012982.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_temporal_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1727017012982.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_sentiment_small_temporal_twitter_roberta_base_2022_154m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_sentiment_small_temporal_twitter_roberta_base_2022_154m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_temporal_twitter_roberta_base_2022_154m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_temporal-twitter-roberta-base-2022-154m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sequence_classification_en.md b/docs/_posts/ahmedlone127/2024-09-22-sequence_classification_en.md new file mode 100644 index 00000000000000..db113779642aff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sequence_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sequence_classification BertForSequenceClassification from xysmalobia +author: John Snow Labs +name: sequence_classification +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sequence_classification` is a English model originally trained by xysmalobia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sequence_classification_en_5.5.0_3.0_1727030482744.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sequence_classification_en_5.5.0_3.0_1727030482744.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("sequence_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("sequence_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sequence_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/xysmalobia/sequence_classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sequence_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sequence_classification_pipeline_en.md new file mode 100644 index 00000000000000..74d5ca00fc9a90 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sequence_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sequence_classification_pipeline pipeline BertForSequenceClassification from xysmalobia +author: John Snow Labs +name: sequence_classification_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sequence_classification_pipeline` is a English model originally trained by xysmalobia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sequence_classification_pipeline_en_5.5.0_3.0_1727030503598.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sequence_classification_pipeline_en_5.5.0_3.0_1727030503598.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sequence_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sequence_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sequence_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/xysmalobia/sequence_classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-smsa_xlm_r_en.md b/docs/_posts/ahmedlone127/2024-09-22-smsa_xlm_r_en.md new file mode 100644 index 00000000000000..4caf575a0af09e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-smsa_xlm_r_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English smsa_xlm_r XlmRoBertaForSequenceClassification from Cincin-nvp +author: John Snow Labs +name: smsa_xlm_r +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`smsa_xlm_r` is a English model originally trained by Cincin-nvp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/smsa_xlm_r_en_5.5.0_3.0_1727009897441.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/smsa_xlm_r_en_5.5.0_3.0_1727009897441.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("smsa_xlm_r","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("smsa_xlm_r", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|smsa_xlm_r| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|847.9 MB| + +## References + +https://huggingface.co/Cincin-nvp/SmSA_XLM-R \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-smsa_xlm_r_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-smsa_xlm_r_pipeline_en.md new file mode 100644 index 00000000000000..ec775e973359b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-smsa_xlm_r_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English smsa_xlm_r_pipeline pipeline XlmRoBertaForSequenceClassification from Cincin-nvp +author: John Snow Labs +name: smsa_xlm_r_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`smsa_xlm_r_pipeline` is a English model originally trained by Cincin-nvp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/smsa_xlm_r_pipeline_en_5.5.0_3.0_1727009966108.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/smsa_xlm_r_pipeline_en_5.5.0_3.0_1727009966108.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("smsa_xlm_r_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("smsa_xlm_r_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|smsa_xlm_r_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|847.9 MB| + +## References + +https://huggingface.co/Cincin-nvp/SmSA_XLM-R + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-smsphishing_en.md b/docs/_posts/ahmedlone127/2024-09-22-smsphishing_en.md new file mode 100644 index 00000000000000..0a096cccc5cee6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-smsphishing_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English smsphishing RoBertaForSequenceClassification from matanbn +author: John Snow Labs +name: smsphishing +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`smsphishing` is a English model originally trained by matanbn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/smsphishing_en_5.5.0_3.0_1727017042474.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/smsphishing_en_5.5.0_3.0_1727017042474.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("smsphishing","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("smsphishing", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|smsphishing| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|427.2 MB| + +## References + +https://huggingface.co/matanbn/smsPhishing \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-smsphishing_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-smsphishing_pipeline_en.md new file mode 100644 index 00000000000000..7e082211527fa9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-smsphishing_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English smsphishing_pipeline pipeline RoBertaForSequenceClassification from matanbn +author: John Snow Labs +name: smsphishing_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`smsphishing_pipeline` is a English model originally trained by matanbn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/smsphishing_pipeline_en_5.5.0_3.0_1727017078269.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/smsphishing_pipeline_en_5.5.0_3.0_1727017078269.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("smsphishing_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("smsphishing_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|smsphishing_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|427.3 MB| + +## References + +https://huggingface.co/matanbn/smsPhishing + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-social_orientation_en.md b/docs/_posts/ahmedlone127/2024-09-22-social_orientation_en.md new file mode 100644 index 00000000000000..92e8806d3f9109 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-social_orientation_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English social_orientation DistilBertForSequenceClassification from tee-oh-double-dee +author: John Snow Labs +name: social_orientation +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`social_orientation` is a English model originally trained by tee-oh-double-dee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/social_orientation_en_5.5.0_3.0_1727033470309.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/social_orientation_en_5.5.0_3.0_1727033470309.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("social_orientation","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("social_orientation", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|social_orientation| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tee-oh-double-dee/social-orientation \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-social_orientation_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-social_orientation_pipeline_en.md new file mode 100644 index 00000000000000..9eb1cf4e72bad8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-social_orientation_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English social_orientation_pipeline pipeline DistilBertForSequenceClassification from tee-oh-double-dee +author: John Snow Labs +name: social_orientation_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`social_orientation_pipeline` is a English model originally trained by tee-oh-double-dee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/social_orientation_pipeline_en_5.5.0_3.0_1727033484452.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/social_orientation_pipeline_en_5.5.0_3.0_1727033484452.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("social_orientation_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("social_orientation_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|social_orientation_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tee-oh-double-dee/social-orientation + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-spam_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-spam_pipeline_en.md new file mode 100644 index 00000000000000..9df6863b68e366 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-spam_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English spam_pipeline pipeline DistilBertForSequenceClassification from Luisdahuis +author: John Snow Labs +name: spam_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spam_pipeline` is a English model originally trained by Luisdahuis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spam_pipeline_en_5.5.0_3.0_1727020407646.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spam_pipeline_en_5.5.0_3.0_1727020407646.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("spam_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("spam_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spam_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Luisdahuis/spam + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sreegeni_finetune_textclass_auto_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sreegeni_finetune_textclass_auto_pipeline_en.md new file mode 100644 index 00000000000000..97863d957d57d4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sreegeni_finetune_textclass_auto_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sreegeni_finetune_textclass_auto_pipeline pipeline RoBertaForSequenceClassification from sreerammadhu +author: John Snow Labs +name: sreegeni_finetune_textclass_auto_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sreegeni_finetune_textclass_auto_pipeline` is a English model originally trained by sreerammadhu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sreegeni_finetune_textclass_auto_pipeline_en_5.5.0_3.0_1727017258993.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sreegeni_finetune_textclass_auto_pipeline_en_5.5.0_3.0_1727017258993.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sreegeni_finetune_textclass_auto_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sreegeni_finetune_textclass_auto_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sreegeni_finetune_textclass_auto_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.7 MB| + +## References + +https://huggingface.co/sreerammadhu/sreegeni-finetune-textclass-auto + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-stereoset_roberta_large_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-22-stereoset_roberta_large_finetuned_en.md new file mode 100644 index 00000000000000..4ffdb7d52af9b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-stereoset_roberta_large_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stereoset_roberta_large_finetuned RoBertaForSequenceClassification from henryscheible +author: John Snow Labs +name: stereoset_roberta_large_finetuned +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stereoset_roberta_large_finetuned` is a English model originally trained by henryscheible. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stereoset_roberta_large_finetuned_en_5.5.0_3.0_1727037325345.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stereoset_roberta_large_finetuned_en_5.5.0_3.0_1727037325345.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("stereoset_roberta_large_finetuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("stereoset_roberta_large_finetuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stereoset_roberta_large_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/henryscheible/stereoset_roberta-large_finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-stereoset_roberta_large_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-stereoset_roberta_large_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..befdc1f8de6b44 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-stereoset_roberta_large_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stereoset_roberta_large_finetuned_pipeline pipeline RoBertaForSequenceClassification from henryscheible +author: John Snow Labs +name: stereoset_roberta_large_finetuned_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stereoset_roberta_large_finetuned_pipeline` is a English model originally trained by henryscheible. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stereoset_roberta_large_finetuned_pipeline_en_5.5.0_3.0_1727037414939.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stereoset_roberta_large_finetuned_pipeline_en_5.5.0_3.0_1727037414939.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stereoset_roberta_large_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stereoset_roberta_large_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stereoset_roberta_large_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/henryscheible/stereoset_roberta-large_finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-stock_sentiment_hp_en.md b/docs/_posts/ahmedlone127/2024-09-22-stock_sentiment_hp_en.md new file mode 100644 index 00000000000000..10724ac8f6cfe4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-stock_sentiment_hp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stock_sentiment_hp BertForSequenceClassification from slisowski +author: John Snow Labs +name: stock_sentiment_hp +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stock_sentiment_hp` is a English model originally trained by slisowski. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stock_sentiment_hp_en_5.5.0_3.0_1727032021377.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stock_sentiment_hp_en_5.5.0_3.0_1727032021377.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("stock_sentiment_hp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("stock_sentiment_hp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stock_sentiment_hp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/slisowski/stock_sentiment_hp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-stock_sentiment_hp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-stock_sentiment_hp_pipeline_en.md new file mode 100644 index 00000000000000..f61f7b0831fe00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-stock_sentiment_hp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stock_sentiment_hp_pipeline pipeline BertForSequenceClassification from slisowski +author: John Snow Labs +name: stock_sentiment_hp_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stock_sentiment_hp_pipeline` is a English model originally trained by slisowski. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stock_sentiment_hp_pipeline_en_5.5.0_3.0_1727032044844.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stock_sentiment_hp_pipeline_en_5.5.0_3.0_1727032044844.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stock_sentiment_hp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stock_sentiment_hp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stock_sentiment_hp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/slisowski/stock_sentiment_hp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-strategytransitionplanv1_en.md b/docs/_posts/ahmedlone127/2024-09-22-strategytransitionplanv1_en.md new file mode 100644 index 00000000000000..6afe2940ffebda --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-strategytransitionplanv1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English strategytransitionplanv1 RoBertaForSequenceClassification from lomov +author: John Snow Labs +name: strategytransitionplanv1 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`strategytransitionplanv1` is a English model originally trained by lomov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/strategytransitionplanv1_en_5.5.0_3.0_1727017356142.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/strategytransitionplanv1_en_5.5.0_3.0_1727017356142.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("strategytransitionplanv1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("strategytransitionplanv1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|strategytransitionplanv1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.6 MB| + +## References + +https://huggingface.co/lomov/strategytransitionplanv1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_pipeline_en.md new file mode 100644 index 00000000000000..402125694c1c6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_pipeline pipeline BertForSequenceClassification from Katsiaryna +author: John Snow Labs +name: stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_pipeline` is a English model originally trained by Katsiaryna. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_pipeline_en_5.5.0_3.0_1727034720633.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_pipeline_en_5.5.0_3.0_1727034720633.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|54.2 MB| + +## References + +https://huggingface.co/Katsiaryna/stsb-TinyBERT-L-4-finetuned_auc_151221-top3_op2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-suicide_distilbert_2_4_en.md b/docs/_posts/ahmedlone127/2024-09-22-suicide_distilbert_2_4_en.md new file mode 100644 index 00000000000000..0e4c5daa1d0a0a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-suicide_distilbert_2_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English suicide_distilbert_2_4 DistilBertForSequenceClassification from cuadron11 +author: John Snow Labs +name: suicide_distilbert_2_4 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`suicide_distilbert_2_4` is a English model originally trained by cuadron11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/suicide_distilbert_2_4_en_5.5.0_3.0_1727012649237.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/suicide_distilbert_2_4_en_5.5.0_3.0_1727012649237.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("suicide_distilbert_2_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("suicide_distilbert_2_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|suicide_distilbert_2_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cuadron11/suicide-distilbert-2-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-swahili_ner_bertbase_cased_pipeline_sw.md b/docs/_posts/ahmedlone127/2024-09-22-swahili_ner_bertbase_cased_pipeline_sw.md new file mode 100644 index 00000000000000..5437302b34c777 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-swahili_ner_bertbase_cased_pipeline_sw.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Swahili (macrolanguage) swahili_ner_bertbase_cased_pipeline pipeline BertForTokenClassification from eolang +author: John Snow Labs +name: swahili_ner_bertbase_cased_pipeline +date: 2024-09-22 +tags: [sw, open_source, pipeline, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`swahili_ner_bertbase_cased_pipeline` is a Swahili (macrolanguage) model originally trained by eolang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/swahili_ner_bertbase_cased_pipeline_sw_5.5.0_3.0_1727040538207.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/swahili_ner_bertbase_cased_pipeline_sw_5.5.0_3.0_1727040538207.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("swahili_ner_bertbase_cased_pipeline", lang = "sw") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("swahili_ner_bertbase_cased_pipeline", lang = "sw") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|swahili_ner_bertbase_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sw| +|Size:|665.1 MB| + +## References + +https://huggingface.co/eolang/Swahili-NER-BertBase-Cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-swahili_ner_bertbase_cased_sw.md b/docs/_posts/ahmedlone127/2024-09-22-swahili_ner_bertbase_cased_sw.md new file mode 100644 index 00000000000000..e96120ec880b71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-swahili_ner_bertbase_cased_sw.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Swahili (macrolanguage) swahili_ner_bertbase_cased BertForTokenClassification from eolang +author: John Snow Labs +name: swahili_ner_bertbase_cased +date: 2024-09-22 +tags: [sw, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`swahili_ner_bertbase_cased` is a Swahili (macrolanguage) model originally trained by eolang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/swahili_ner_bertbase_cased_sw_5.5.0_3.0_1727040504584.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/swahili_ner_bertbase_cased_sw_5.5.0_3.0_1727040504584.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("swahili_ner_bertbase_cased","sw") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("swahili_ner_bertbase_cased", "sw") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|swahili_ner_bertbase_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|sw| +|Size:|665.1 MB| + +## References + +https://huggingface.co/eolang/Swahili-NER-BertBase-Cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-swot_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-swot_classifier_pipeline_en.md new file mode 100644 index 00000000000000..bd4f20628cdedf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-swot_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English swot_classifier_pipeline pipeline DistilBertForSequenceClassification from jcaponigro +author: John Snow Labs +name: swot_classifier_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`swot_classifier_pipeline` is a English model originally trained by jcaponigro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/swot_classifier_pipeline_en_5.5.0_3.0_1727012243916.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/swot_classifier_pipeline_en_5.5.0_3.0_1727012243916.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("swot_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("swot_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|swot_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jcaponigro/SWOT_Classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-t_7_en.md b/docs/_posts/ahmedlone127/2024-09-22-t_7_en.md new file mode 100644 index 00000000000000..798452b53b2d21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-t_7_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English t_7 RoBertaForSequenceClassification from Pablojmed +author: John Snow Labs +name: t_7 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t_7` is a English model originally trained by Pablojmed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t_7_en_5.5.0_3.0_1727017187438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t_7_en_5.5.0_3.0_1727017187438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("t_7","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("t_7", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t_7| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|438.2 MB| + +## References + +https://huggingface.co/Pablojmed/t_7 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-t_7_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-t_7_pipeline_en.md new file mode 100644 index 00000000000000..0d2c34f566347f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-t_7_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English t_7_pipeline pipeline RoBertaForSequenceClassification from Pablojmed +author: John Snow Labs +name: t_7_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t_7_pipeline` is a English model originally trained by Pablojmed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t_7_pipeline_en_5.5.0_3.0_1727017215307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t_7_pipeline_en_5.5.0_3.0_1727017215307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("t_7_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("t_7_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t_7_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.2 MB| + +## References + +https://huggingface.co/Pablojmed/t_7 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-tanya_mama_ner_en.md b/docs/_posts/ahmedlone127/2024-09-22-tanya_mama_ner_en.md new file mode 100644 index 00000000000000..a45f9ccf69e6f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-tanya_mama_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tanya_mama_ner XlmRoBertaForTokenClassification from Domo123 +author: John Snow Labs +name: tanya_mama_ner +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tanya_mama_ner` is a English model originally trained by Domo123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tanya_mama_ner_en_5.5.0_3.0_1727019506290.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tanya_mama_ner_en_5.5.0_3.0_1727019506290.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("tanya_mama_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("tanya_mama_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tanya_mama_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|841.6 MB| + +## References + +https://huggingface.co/Domo123/tanya-mama-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-tapp_multilabel_climatebert_f_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-tapp_multilabel_climatebert_f_pipeline_en.md new file mode 100644 index 00000000000000..1a16abec5c79e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-tapp_multilabel_climatebert_f_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tapp_multilabel_climatebert_f_pipeline pipeline RoBertaForSequenceClassification from GIZ +author: John Snow Labs +name: tapp_multilabel_climatebert_f_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tapp_multilabel_climatebert_f_pipeline` is a English model originally trained by GIZ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tapp_multilabel_climatebert_f_pipeline_en_5.5.0_3.0_1726972206134.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tapp_multilabel_climatebert_f_pipeline_en_5.5.0_3.0_1726972206134.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tapp_multilabel_climatebert_f_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tapp_multilabel_climatebert_f_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tapp_multilabel_climatebert_f_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.7 MB| + +## References + +https://huggingface.co/GIZ/TAPP-multilabel-climatebert_f + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-task4_2_en.md b/docs/_posts/ahmedlone127/2024-09-22-task4_2_en.md new file mode 100644 index 00000000000000..800adf85bbce40 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-task4_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English task4_2 DistilBertForSequenceClassification from Koreander +author: John Snow Labs +name: task4_2 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`task4_2` is a English model originally trained by Koreander. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/task4_2_en_5.5.0_3.0_1727033707058.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/task4_2_en_5.5.0_3.0_1727033707058.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("task4_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("task4_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|task4_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Koreander/task4-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-test_glue_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-test_glue_pipeline_en.md new file mode 100644 index 00000000000000..5d47623797709e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-test_glue_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_glue_pipeline pipeline DistilBertForSequenceClassification from honghk +author: John Snow Labs +name: test_glue_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_glue_pipeline` is a English model originally trained by honghk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_glue_pipeline_en_5.5.0_3.0_1727012761309.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_glue_pipeline_en_5.5.0_3.0_1727012761309.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_glue_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_glue_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_glue_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/honghk/test-glue + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-test_medner_en.md b/docs/_posts/ahmedlone127/2024-09-22-test_medner_en.md new file mode 100644 index 00000000000000..bc3c40361e2e41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-test_medner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_medner BertForTokenClassification from adalbertojunior +author: John Snow Labs +name: test_medner +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_medner` is a English model originally trained by adalbertojunior. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_medner_en_5.5.0_3.0_1727040323041.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_medner_en_5.5.0_3.0_1727040323041.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("test_medner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("test_medner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_medner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|664.9 MB| + +## References + +https://huggingface.co/adalbertojunior/test-medner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-test_medner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-test_medner_pipeline_en.md new file mode 100644 index 00000000000000..986581b6039bb2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-test_medner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_medner_pipeline pipeline BertForTokenClassification from adalbertojunior +author: John Snow Labs +name: test_medner_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_medner_pipeline` is a English model originally trained by adalbertojunior. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_medner_pipeline_en_5.5.0_3.0_1727040356909.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_medner_pipeline_en_5.5.0_3.0_1727040356909.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_medner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_medner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_medner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|664.9 MB| + +## References + +https://huggingface.co/adalbertojunior/test-medner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-test_model_2_en.md b/docs/_posts/ahmedlone127/2024-09-22-test_model_2_en.md new file mode 100644 index 00000000000000..9e30af59c33b48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-test_model_2_en.md @@ -0,0 +1,96 @@ +--- +layout: model +title: English test_model_2 DistilBertEmbeddings from TamBeo +author: John Snow Labs +name: test_model_2 +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, distilbert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_model_2` is a English model originally trained by TamBeo. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_model_2_en_5.5.0_3.0_1727012643926.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_model_2_en_5.5.0_3.0_1727012643926.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DistilBertEmbeddings.pretrained("test_model_2","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = DistilBertEmbeddings.pretrained("test_model_2","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_model_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +References + +https://huggingface.co/TamBeo/test_model_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-test_roberta_euhaq_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-test_roberta_euhaq_pipeline_en.md new file mode 100644 index 00000000000000..775978fcaecaf6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-test_roberta_euhaq_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_roberta_euhaq_pipeline pipeline RoBertaForSequenceClassification from euhaq +author: John Snow Labs +name: test_roberta_euhaq_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_roberta_euhaq_pipeline` is a English model originally trained by euhaq. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_roberta_euhaq_pipeline_en_5.5.0_3.0_1726967578627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_roberta_euhaq_pipeline_en_5.5.0_3.0_1726967578627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_roberta_euhaq_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_roberta_euhaq_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_roberta_euhaq_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|421.9 MB| + +## References + +https://huggingface.co/euhaq/test_roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-test_trainer11a_en.md b/docs/_posts/ahmedlone127/2024-09-22-test_trainer11a_en.md new file mode 100644 index 00000000000000..de42e6e00af75a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-test_trainer11a_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_trainer11a DistilBertForSequenceClassification from SimoneJLaudani +author: John Snow Labs +name: test_trainer11a +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_trainer11a` is a English model originally trained by SimoneJLaudani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_trainer11a_en_5.5.0_3.0_1727020392937.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_trainer11a_en_5.5.0_3.0_1727020392937.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_trainer11a","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_trainer11a", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_trainer11a| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/SimoneJLaudani/test_trainer11a \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-test_ver5_ko.md b/docs/_posts/ahmedlone127/2024-09-22-test_ver5_ko.md new file mode 100644 index 00000000000000..6134ad775edbf4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-test_ver5_ko.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Korean test_ver5 WhisperForCTC from Jpep26 +author: John Snow Labs +name: test_ver5 +date: 2024-09-22 +tags: [ko, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_ver5` is a Korean model originally trained by Jpep26. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_ver5_ko_5.5.0_3.0_1727023573446.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_ver5_ko_5.5.0_3.0_1727023573446.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("test_ver5","ko") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("test_ver5", "ko") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_ver5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ko| +|Size:|642.4 MB| + +## References + +https://huggingface.co/Jpep26/test_ver5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-test_ver5_pipeline_ko.md b/docs/_posts/ahmedlone127/2024-09-22-test_ver5_pipeline_ko.md new file mode 100644 index 00000000000000..4873ef831319b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-test_ver5_pipeline_ko.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Korean test_ver5_pipeline pipeline WhisperForCTC from Jpep26 +author: John Snow Labs +name: test_ver5_pipeline +date: 2024-09-22 +tags: [ko, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_ver5_pipeline` is a Korean model originally trained by Jpep26. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_ver5_pipeline_ko_5.5.0_3.0_1727023606900.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_ver5_pipeline_ko_5.5.0_3.0_1727023606900.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_ver5_pipeline", lang = "ko") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_ver5_pipeline", lang = "ko") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_ver5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|642.4 MB| + +## References + +https://huggingface.co/Jpep26/test_ver5 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-text_classification_5000_en.md b/docs/_posts/ahmedlone127/2024-09-22-text_classification_5000_en.md new file mode 100644 index 00000000000000..ed9d284c2f7c13 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-text_classification_5000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English text_classification_5000 DistilBertForSequenceClassification from Neroism8422 +author: John Snow Labs +name: text_classification_5000 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_classification_5000` is a English model originally trained by Neroism8422. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_classification_5000_en_5.5.0_3.0_1727021071557.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_classification_5000_en_5.5.0_3.0_1727021071557.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_classification_5000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_classification_5000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_classification_5000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Neroism8422/Text_Classification_5000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-text_classification_5000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-text_classification_5000_pipeline_en.md new file mode 100644 index 00000000000000..ff1edc0e89c102 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-text_classification_5000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English text_classification_5000_pipeline pipeline DistilBertForSequenceClassification from Neroism8422 +author: John Snow Labs +name: text_classification_5000_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_classification_5000_pipeline` is a English model originally trained by Neroism8422. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_classification_5000_pipeline_en_5.5.0_3.0_1727021083192.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_classification_5000_pipeline_en_5.5.0_3.0_1727021083192.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("text_classification_5000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("text_classification_5000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_classification_5000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Neroism8422/Text_Classification_5000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_en.md b/docs/_posts/ahmedlone127/2024-09-22-tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_en.md new file mode 100644 index 00000000000000..1db666bf8032ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13 WhisperForCTC from saahith +author: John Snow Labs +name: tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13 +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_en_5.5.0_3.0_1727023433362.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_en_5.5.0_3.0_1727023433362.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|394.5 MB| + +## References + +https://huggingface.co/saahith/tiny.en-combined_v3-trimmed-1-0.15-8-1e-05-dainty-sweep-13 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_pipeline_en.md new file mode 100644 index 00000000000000..d7d186e173c03b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_pipeline pipeline WhisperForCTC from saahith +author: John Snow Labs +name: tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_pipeline` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_pipeline_en_5.5.0_3.0_1727023452999.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_pipeline_en_5.5.0_3.0_1727023452999.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_english_combined_v3_trimmed_1_0_15_8_1e_05_dainty_sweep_13_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|394.5 MB| + +## References + +https://huggingface.co/saahith/tiny.en-combined_v3-trimmed-1-0.15-8-1e-05-dainty-sweep-13 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-topic_topic_random1_seed1_roberta_large_en.md b/docs/_posts/ahmedlone127/2024-09-22-topic_topic_random1_seed1_roberta_large_en.md new file mode 100644 index 00000000000000..32ab34fc51f6bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-topic_topic_random1_seed1_roberta_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English topic_topic_random1_seed1_roberta_large RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_random1_seed1_roberta_large +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_random1_seed1_roberta_large` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_random1_seed1_roberta_large_en_5.5.0_3.0_1727026950525.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_random1_seed1_roberta_large_en_5.5.0_3.0_1727026950525.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("topic_topic_random1_seed1_roberta_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("topic_topic_random1_seed1_roberta_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_random1_seed1_roberta_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_random1_seed1-roberta-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-trainer8_en.md b/docs/_posts/ahmedlone127/2024-09-22-trainer8_en.md new file mode 100644 index 00000000000000..7ddf5565836888 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-trainer8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trainer8 DistilBertForSequenceClassification from SimoneJLaudani +author: John Snow Labs +name: trainer8 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trainer8` is a English model originally trained by SimoneJLaudani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trainer8_en_5.5.0_3.0_1727020486538.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trainer8_en_5.5.0_3.0_1727020486538.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("trainer8","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("trainer8", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trainer8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/SimoneJLaudani/trainer8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-transformer_classification_lex_ceo_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-transformer_classification_lex_ceo_test_pipeline_en.md new file mode 100644 index 00000000000000..fe40b5ea4906d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-transformer_classification_lex_ceo_test_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English transformer_classification_lex_ceo_test_pipeline pipeline RoBertaForSequenceClassification from rd-1 +author: John Snow Labs +name: transformer_classification_lex_ceo_test_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`transformer_classification_lex_ceo_test_pipeline` is a English model originally trained by rd-1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/transformer_classification_lex_ceo_test_pipeline_en_5.5.0_3.0_1727027029444.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/transformer_classification_lex_ceo_test_pipeline_en_5.5.0_3.0_1727027029444.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("transformer_classification_lex_ceo_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("transformer_classification_lex_ceo_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|transformer_classification_lex_ceo_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.4 MB| + +## References + +https://huggingface.co/rd-1/transformer_classification_lex_ceo_test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-trial_model_hrudhai_rajasekhar_en.md b/docs/_posts/ahmedlone127/2024-09-22-trial_model_hrudhai_rajasekhar_en.md new file mode 100644 index 00000000000000..e3eb9556e3cb6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-trial_model_hrudhai_rajasekhar_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trial_model_hrudhai_rajasekhar RoBertaForSequenceClassification from hrudhai-rajasekhar +author: John Snow Labs +name: trial_model_hrudhai_rajasekhar +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trial_model_hrudhai_rajasekhar` is a English model originally trained by hrudhai-rajasekhar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trial_model_hrudhai_rajasekhar_en_5.5.0_3.0_1726967529018.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trial_model_hrudhai_rajasekhar_en_5.5.0_3.0_1726967529018.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("trial_model_hrudhai_rajasekhar","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("trial_model_hrudhai_rajasekhar", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trial_model_hrudhai_rajasekhar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|416.1 MB| + +## References + +https://huggingface.co/hrudhai-rajasekhar/trial-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-trial_model_hrudhai_rajasekhar_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-trial_model_hrudhai_rajasekhar_pipeline_en.md new file mode 100644 index 00000000000000..71afcb5558d9a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-trial_model_hrudhai_rajasekhar_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trial_model_hrudhai_rajasekhar_pipeline pipeline RoBertaForSequenceClassification from hrudhai-rajasekhar +author: John Snow Labs +name: trial_model_hrudhai_rajasekhar_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trial_model_hrudhai_rajasekhar_pipeline` is a English model originally trained by hrudhai-rajasekhar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trial_model_hrudhai_rajasekhar_pipeline_en_5.5.0_3.0_1726967571784.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trial_model_hrudhai_rajasekhar_pipeline_en_5.5.0_3.0_1726967571784.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trial_model_hrudhai_rajasekhar_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trial_model_hrudhai_rajasekhar_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trial_model_hrudhai_rajasekhar_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|416.1 MB| + +## References + +https://huggingface.co/hrudhai-rajasekhar/trial-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-twitchleaguebert_260k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-twitchleaguebert_260k_pipeline_en.md new file mode 100644 index 00000000000000..b3dac2e1d41fe4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-twitchleaguebert_260k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitchleaguebert_260k_pipeline pipeline RoBertaEmbeddings from Epidot +author: John Snow Labs +name: twitchleaguebert_260k_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitchleaguebert_260k_pipeline` is a English model originally trained by Epidot. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitchleaguebert_260k_pipeline_en_5.5.0_3.0_1726999591099.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitchleaguebert_260k_pipeline_en_5.5.0_3.0_1726999591099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitchleaguebert_260k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitchleaguebert_260k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitchleaguebert_260k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|305.9 MB| + +## References + +https://huggingface.co/Epidot/TwitchLeagueBert-260k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-twitter_roberta_base_3epoch10_64_en.md b/docs/_posts/ahmedlone127/2024-09-22-twitter_roberta_base_3epoch10_64_en.md new file mode 100644 index 00000000000000..243510cd5a18b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-twitter_roberta_base_3epoch10_64_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English twitter_roberta_base_3epoch10_64 RoBertaForSequenceClassification from dianamihalache27 +author: John Snow Labs +name: twitter_roberta_base_3epoch10_64 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_3epoch10_64` is a English model originally trained by dianamihalache27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_3epoch10_64_en_5.5.0_3.0_1727036995070.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_3epoch10_64_en_5.5.0_3.0_1727036995070.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("twitter_roberta_base_3epoch10_64","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("twitter_roberta_base_3epoch10_64", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_3epoch10_64| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/dianamihalache27/twitter-roberta-base_3epoch10.64 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-type_inference_en.md b/docs/_posts/ahmedlone127/2024-09-22-type_inference_en.md new file mode 100644 index 00000000000000..5c9096d78e304a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-type_inference_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English type_inference RoBertaEmbeddings from ZQ +author: John Snow Labs +name: type_inference +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`type_inference` is a English model originally trained by ZQ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/type_inference_en_5.5.0_3.0_1727041895483.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/type_inference_en_5.5.0_3.0_1727041895483.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("type_inference","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("type_inference","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|type_inference| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/ZQ/Type_Inference \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-type_inference_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-type_inference_pipeline_en.md new file mode 100644 index 00000000000000..5daa836f12fb3d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-type_inference_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English type_inference_pipeline pipeline RoBertaEmbeddings from ZQ +author: John Snow Labs +name: type_inference_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`type_inference_pipeline` is a English model originally trained by ZQ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/type_inference_pipeline_en_5.5.0_3.0_1727041919414.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/type_inference_pipeline_en_5.5.0_3.0_1727041919414.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("type_inference_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("type_inference_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|type_inference_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/ZQ/Type_Inference + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-uned_tfg_08_62_mas_frecuentes_en.md b/docs/_posts/ahmedlone127/2024-09-22-uned_tfg_08_62_mas_frecuentes_en.md new file mode 100644 index 00000000000000..642b62e7fe4d9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-uned_tfg_08_62_mas_frecuentes_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English uned_tfg_08_62_mas_frecuentes RoBertaForSequenceClassification from alexisdr +author: John Snow Labs +name: uned_tfg_08_62_mas_frecuentes +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`uned_tfg_08_62_mas_frecuentes` is a English model originally trained by alexisdr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/uned_tfg_08_62_mas_frecuentes_en_5.5.0_3.0_1727026987671.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/uned_tfg_08_62_mas_frecuentes_en_5.5.0_3.0_1727026987671.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("uned_tfg_08_62_mas_frecuentes","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("uned_tfg_08_62_mas_frecuentes", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|uned_tfg_08_62_mas_frecuentes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|430.8 MB| + +## References + +https://huggingface.co/alexisdr/uned-tfg-08.62_mas_frecuentes \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-voicemath_tiny_en.md b/docs/_posts/ahmedlone127/2024-09-22-voicemath_tiny_en.md new file mode 100644 index 00000000000000..709590bdb89fad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-voicemath_tiny_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English voicemath_tiny WhisperForCTC from hoonsung +author: John Snow Labs +name: voicemath_tiny +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`voicemath_tiny` is a English model originally trained by hoonsung. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/voicemath_tiny_en_5.5.0_3.0_1726995741643.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/voicemath_tiny_en_5.5.0_3.0_1726995741643.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("voicemath_tiny","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("voicemath_tiny", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|voicemath_tiny| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|394.9 MB| + +## References + +https://huggingface.co/hoonsung/VoiceMath-Tiny \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_ai_nose_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_ai_nose_pipeline_en.md new file mode 100644 index 00000000000000..3b891a789404df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_ai_nose_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_ai_nose_pipeline pipeline WhisperForCTC from susmitabhatt +author: John Snow Labs +name: whisper_ai_nose_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_ai_nose_pipeline` is a English model originally trained by susmitabhatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_ai_nose_pipeline_en_5.5.0_3.0_1727022480769.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_ai_nose_pipeline_en_5.5.0_3.0_1727022480769.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_ai_nose_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_ai_nose_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_ai_nose_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/susmitabhatt/whisper-ai-nose + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_base2_ko.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_base2_ko.md new file mode 100644 index 00000000000000..71dfa1ef9325ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_base2_ko.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Korean whisper_base2 WhisperForCTC from Dearlie +author: John Snow Labs +name: whisper_base2 +date: 2024-09-22 +tags: [ko, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base2` is a Korean model originally trained by Dearlie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base2_ko_5.5.0_3.0_1726997724494.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base2_ko_5.5.0_3.0_1726997724494.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base2","ko") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base2", "ko") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ko| +|Size:|398.6 MB| + +## References + +https://huggingface.co/Dearlie/whisper-base2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_base_ga2en_v1_1_ga.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_base_ga2en_v1_1_ga.md new file mode 100644 index 00000000000000..fb32c3f6dc701c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_base_ga2en_v1_1_ga.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Irish whisper_base_ga2en_v1_1 WhisperForCTC from ymoslem +author: John Snow Labs +name: whisper_base_ga2en_v1_1 +date: 2024-09-22 +tags: [ga, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ga +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_ga2en_v1_1` is a Irish model originally trained by ymoslem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_ga2en_v1_1_ga_5.5.0_3.0_1727024714531.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_ga2en_v1_1_ga_5.5.0_3.0_1727024714531.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_ga2en_v1_1","ga") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_ga2en_v1_1", "ga") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_ga2en_v1_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ga| +|Size:|641.2 MB| + +## References + +https://huggingface.co/ymoslem/whisper-base-ga2en-v1.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_base_germanmed_full_de.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_base_germanmed_full_de.md new file mode 100644 index 00000000000000..87a9e542d42685 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_base_germanmed_full_de.md @@ -0,0 +1,84 @@ +--- +layout: model +title: German whisper_base_germanmed_full WhisperForCTC from Hanhpt23 +author: John Snow Labs +name: whisper_base_germanmed_full +date: 2024-09-22 +tags: [de, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_germanmed_full` is a German model originally trained by Hanhpt23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_germanmed_full_de_5.5.0_3.0_1727023098544.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_germanmed_full_de_5.5.0_3.0_1727023098544.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_germanmed_full","de") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_germanmed_full", "de") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_germanmed_full| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|de| +|Size:|614.6 MB| + +## References + +https://huggingface.co/Hanhpt23/whisper-base-GermanMed-full \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_medium_bambara_field_bm.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_medium_bambara_field_bm.md new file mode 100644 index 00000000000000..75bf1eb4053036 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_medium_bambara_field_bm.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Bambara whisper_medium_bambara_field WhisperForCTC from RobbieJimersonJr +author: John Snow Labs +name: whisper_medium_bambara_field +date: 2024-09-22 +tags: [bm, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: bm +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_bambara_field` is a Bambara model originally trained by RobbieJimersonJr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_bambara_field_bm_5.5.0_3.0_1727025634380.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_bambara_field_bm_5.5.0_3.0_1727025634380.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_medium_bambara_field","bm") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_medium_bambara_field", "bm") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_bambara_field| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|bm| +|Size:|4.8 GB| + +## References + +https://huggingface.co/RobbieJimersonJr/whisper-medium-Bambara-field \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_medium_bambara_field_pipeline_bm.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_medium_bambara_field_pipeline_bm.md new file mode 100644 index 00000000000000..52c75d02f7ebe2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_medium_bambara_field_pipeline_bm.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Bambara whisper_medium_bambara_field_pipeline pipeline WhisperForCTC from RobbieJimersonJr +author: John Snow Labs +name: whisper_medium_bambara_field_pipeline +date: 2024-09-22 +tags: [bm, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: bm +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_bambara_field_pipeline` is a Bambara model originally trained by RobbieJimersonJr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_bambara_field_pipeline_bm_5.5.0_3.0_1727025839775.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_bambara_field_pipeline_bm_5.5.0_3.0_1727025839775.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_medium_bambara_field_pipeline", lang = "bm") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_medium_bambara_field_pipeline", lang = "bm") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_bambara_field_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bm| +|Size:|4.8 GB| + +## References + +https://huggingface.co/RobbieJimersonJr/whisper-medium-Bambara-field + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_medium_tajik_tj_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_medium_tajik_tj_en.md new file mode 100644 index 00000000000000..6f353616033629 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_medium_tajik_tj_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_medium_tajik_tj WhisperForCTC from muhtasham +author: John Snow Labs +name: whisper_medium_tajik_tj +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_tajik_tj` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_tajik_tj_en_5.5.0_3.0_1727024476664.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_tajik_tj_en_5.5.0_3.0_1727024476664.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_medium_tajik_tj","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_medium_tajik_tj", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_tajik_tj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/muhtasham/whisper-medium-tg_tj \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_sft_german_de.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_sft_german_de.md new file mode 100644 index 00000000000000..04b00dc75d67f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_sft_german_de.md @@ -0,0 +1,84 @@ +--- +layout: model +title: German whisper_sft_german WhisperForCTC from Jamin20 +author: John Snow Labs +name: whisper_sft_german +date: 2024-09-22 +tags: [de, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_sft_german` is a German model originally trained by Jamin20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_sft_german_de_5.5.0_3.0_1726982555988.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_sft_german_de_5.5.0_3.0_1726982555988.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_sft_german","de") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_sft_german", "de") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_sft_german| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|de| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Jamin20/whisper_sft_de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_allsnr_v4_de.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_allsnr_v4_de.md new file mode 100644 index 00000000000000..1627ff442a7f6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_allsnr_v4_de.md @@ -0,0 +1,84 @@ +--- +layout: model +title: German whisper_small_allsnr_v4 WhisperForCTC from marccgrau +author: John Snow Labs +name: whisper_small_allsnr_v4 +date: 2024-09-22 +tags: [de, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_allsnr_v4` is a German model originally trained by marccgrau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_allsnr_v4_de_5.5.0_3.0_1727024315731.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_allsnr_v4_de_5.5.0_3.0_1727024315731.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_allsnr_v4","de") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_allsnr_v4", "de") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_allsnr_v4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|de| +|Size:|1.7 GB| + +## References + +https://huggingface.co/marccgrau/whisper-small-allSNR-v4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_allsnr_v4_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_allsnr_v4_pipeline_de.md new file mode 100644 index 00000000000000..8faa4e15835957 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_allsnr_v4_pipeline_de.md @@ -0,0 +1,69 @@ +--- +layout: model +title: German whisper_small_allsnr_v4_pipeline pipeline WhisperForCTC from marccgrau +author: John Snow Labs +name: whisper_small_allsnr_v4_pipeline +date: 2024-09-22 +tags: [de, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_allsnr_v4_pipeline` is a German model originally trained by marccgrau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_allsnr_v4_pipeline_de_5.5.0_3.0_1727024417964.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_allsnr_v4_pipeline_de_5.5.0_3.0_1727024417964.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_allsnr_v4_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_allsnr_v4_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_allsnr_v4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|1.7 GB| + +## References + +https://huggingface.co/marccgrau/whisper-small-allSNR-v4 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_arabic_huggingpanda_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_arabic_huggingpanda_pipeline_en.md new file mode 100644 index 00000000000000..667a983e8dff93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_arabic_huggingpanda_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_arabic_huggingpanda_pipeline pipeline WhisperForCTC from HuggingPanda +author: John Snow Labs +name: whisper_small_arabic_huggingpanda_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_huggingpanda_pipeline` is a English model originally trained by HuggingPanda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_huggingpanda_pipeline_en_5.5.0_3.0_1727023032941.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_huggingpanda_pipeline_en_5.5.0_3.0_1727023032941.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_arabic_huggingpanda_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_arabic_huggingpanda_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_huggingpanda_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/HuggingPanda/whisper-small-arabic + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_divehi_avnishkanungo_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_divehi_avnishkanungo_en.md new file mode 100644 index 00000000000000..5270386f3e6c46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_divehi_avnishkanungo_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_divehi_avnishkanungo WhisperForCTC from avnishkanungo +author: John Snow Labs +name: whisper_small_divehi_avnishkanungo +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_avnishkanungo` is a English model originally trained by avnishkanungo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_avnishkanungo_en_5.5.0_3.0_1727024617000.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_avnishkanungo_en_5.5.0_3.0_1727024617000.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_divehi_avnishkanungo","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_divehi_avnishkanungo", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_avnishkanungo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/avnishkanungo/whisper-small-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_divehi_dahml_dv.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_divehi_dahml_dv.md new file mode 100644 index 00000000000000..fabe22ca64f645 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_divehi_dahml_dv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_dahml WhisperForCTC from DahmL +author: John Snow Labs +name: whisper_small_divehi_dahml +date: 2024-09-22 +tags: [dv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_dahml` is a Dhivehi, Divehi, Maldivian model originally trained by DahmL. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_dahml_dv_5.5.0_3.0_1726982482552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_dahml_dv_5.5.0_3.0_1726982482552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_divehi_dahml","dv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_divehi_dahml", "dv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_dahml| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/DahmL/whisper-small-dv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_estonian_rristo_pipeline_et.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_estonian_rristo_pipeline_et.md new file mode 100644 index 00000000000000..d1717b1c06e328 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_estonian_rristo_pipeline_et.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Estonian whisper_small_estonian_rristo_pipeline pipeline WhisperForCTC from rristo +author: John Snow Labs +name: whisper_small_estonian_rristo_pipeline +date: 2024-09-22 +tags: [et, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: et +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_estonian_rristo_pipeline` is a Estonian model originally trained by rristo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_estonian_rristo_pipeline_et_5.5.0_3.0_1726997409451.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_estonian_rristo_pipeline_et_5.5.0_3.0_1726997409451.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_estonian_rristo_pipeline", lang = "et") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_estonian_rristo_pipeline", lang = "et") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_estonian_rristo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|et| +|Size:|1.7 GB| + +## References + +https://huggingface.co/rristo/whisper-small-et + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_hindi_srirama_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_hindi_srirama_en.md new file mode 100644 index 00000000000000..e0e92440d1131c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_hindi_srirama_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_hindi_srirama WhisperForCTC from srirama +author: John Snow Labs +name: whisper_small_hindi_srirama +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_srirama` is a English model originally trained by srirama. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_srirama_en_5.5.0_3.0_1727024528122.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_srirama_en_5.5.0_3.0_1727024528122.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_srirama","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_srirama", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_srirama| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/srirama/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_irish_pipeline_ga.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_irish_pipeline_ga.md new file mode 100644 index 00000000000000..77dcb8905b75b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_irish_pipeline_ga.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Irish whisper_small_irish_pipeline pipeline WhisperForCTC from callum-canavan +author: John Snow Labs +name: whisper_small_irish_pipeline +date: 2024-09-22 +tags: [ga, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ga +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_irish_pipeline` is a Irish model originally trained by callum-canavan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_irish_pipeline_ga_5.5.0_3.0_1726997773111.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_irish_pipeline_ga_5.5.0_3.0_1726997773111.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_irish_pipeline", lang = "ga") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_irish_pipeline", lang = "ga") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_irish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ga| +|Size:|1.7 GB| + +## References + +https://huggingface.co/callum-canavan/whisper-small-ga + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_nomo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_nomo_pipeline_en.md new file mode 100644 index 00000000000000..cac50e713c82e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_nomo_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_nomo_pipeline pipeline WhisperForCTC from susmitabhatt +author: John Snow Labs +name: whisper_small_nomo_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_nomo_pipeline` is a English model originally trained by susmitabhatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_nomo_pipeline_en_5.5.0_3.0_1726994946969.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_nomo_pipeline_en_5.5.0_3.0_1726994946969.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_nomo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_nomo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_nomo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/susmitabhatt/whisper-small-nomo + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_r2_50k_2ep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_r2_50k_2ep_pipeline_en.md new file mode 100644 index 00000000000000..91c0d0b459dffe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_r2_50k_2ep_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_r2_50k_2ep_pipeline pipeline WhisperForCTC from spsither +author: John Snow Labs +name: whisper_small_r2_50k_2ep_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_r2_50k_2ep_pipeline` is a English model originally trained by spsither. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_r2_50k_2ep_pipeline_en_5.5.0_3.0_1727024901961.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_r2_50k_2ep_pipeline_en_5.5.0_3.0_1727024901961.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_r2_50k_2ep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_r2_50k_2ep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_r2_50k_2ep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/spsither/whisper-small-r2-50k-2ep + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_tamil_steja_ta.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_tamil_steja_ta.md new file mode 100644 index 00000000000000..6e8e8a69145e80 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_tamil_steja_ta.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Tamil whisper_small_tamil_steja WhisperForCTC from steja +author: John Snow Labs +name: whisper_small_tamil_steja +date: 2024-09-22 +tags: [ta, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ta +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_tamil_steja` is a Tamil model originally trained by steja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_tamil_steja_ta_5.5.0_3.0_1726983471823.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_tamil_steja_ta_5.5.0_3.0_1726983471823.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_tamil_steja","ta") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_tamil_steja", "ta") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_tamil_steja| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ta| +|Size:|1.7 GB| + +## References + +https://huggingface.co/steja/whisper-small-tamil \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_turkish_cp2_tr.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_turkish_cp2_tr.md new file mode 100644 index 00000000000000..0864d327267c67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_turkish_cp2_tr.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Turkish whisper_small_turkish_cp2 WhisperForCTC from Kiwipirate +author: John Snow Labs +name: whisper_small_turkish_cp2 +date: 2024-09-22 +tags: [tr, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_turkish_cp2` is a Turkish model originally trained by Kiwipirate. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_turkish_cp2_tr_5.5.0_3.0_1727024993129.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_turkish_cp2_tr_5.5.0_3.0_1727024993129.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_turkish_cp2","tr") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_turkish_cp2", "tr") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_turkish_cp2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|tr| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Kiwipirate/whisper-small-tr-cp2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_yoruba_omoekan_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_yoruba_omoekan_en.md new file mode 100644 index 00000000000000..da31c2a0b19b77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_yoruba_omoekan_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_yoruba_omoekan WhisperForCTC from omoekan +author: John Snow Labs +name: whisper_small_yoruba_omoekan +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_yoruba_omoekan` is a English model originally trained by omoekan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_yoruba_omoekan_en_5.5.0_3.0_1727025247375.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_yoruba_omoekan_en_5.5.0_3.0_1727025247375.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_yoruba_omoekan","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_yoruba_omoekan", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_yoruba_omoekan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/omoekan/whisper-small-yoruba \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_yoruba_omoekan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_yoruba_omoekan_pipeline_en.md new file mode 100644 index 00000000000000..4ac11dcf4acae6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_yoruba_omoekan_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_yoruba_omoekan_pipeline pipeline WhisperForCTC from omoekan +author: John Snow Labs +name: whisper_small_yoruba_omoekan_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_yoruba_omoekan_pipeline` is a English model originally trained by omoekan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_yoruba_omoekan_pipeline_en_5.5.0_3.0_1727025329863.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_yoruba_omoekan_pipeline_en_5.5.0_3.0_1727025329863.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_yoruba_omoekan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_yoruba_omoekan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_yoruba_omoekan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/omoekan/whisper-small-yoruba + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_final_cafet_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_final_cafet_pipeline_en.md new file mode 100644 index 00000000000000..01037aea44f1c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_final_cafet_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_final_cafet_pipeline pipeline WhisperForCTC from Cafet +author: John Snow Labs +name: whisper_tiny_final_cafet_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_final_cafet_pipeline` is a English model originally trained by Cafet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_final_cafet_pipeline_en_5.5.0_3.0_1726983640075.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_final_cafet_pipeline_en_5.5.0_3.0_1726983640075.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_final_cafet_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_final_cafet_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_final_cafet_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.6 MB| + +## References + +https://huggingface.co/Cafet/whisper-tiny-final + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_italian_mattiasu96_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_italian_mattiasu96_pipeline_it.md new file mode 100644 index 00000000000000..9c2535faa2aec4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_italian_mattiasu96_pipeline_it.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Italian whisper_tiny_italian_mattiasu96_pipeline pipeline WhisperForCTC from mattiasu96 +author: John Snow Labs +name: whisper_tiny_italian_mattiasu96_pipeline +date: 2024-09-22 +tags: [it, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_italian_mattiasu96_pipeline` is a Italian model originally trained by mattiasu96. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_italian_mattiasu96_pipeline_it_5.5.0_3.0_1727022253421.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_italian_mattiasu96_pipeline_it_5.5.0_3.0_1727022253421.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_italian_mattiasu96_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_italian_mattiasu96_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_italian_mattiasu96_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|390.8 MB| + +## References + +https://huggingface.co/mattiasu96/whisper-tiny-it + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_hewliyang_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_hewliyang_en.md new file mode 100644 index 00000000000000..848ee5c99e1235 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_hewliyang_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_minds14_hewliyang WhisperForCTC from hewliyang +author: John Snow Labs +name: whisper_tiny_minds14_hewliyang +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_hewliyang` is a English model originally trained by hewliyang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_hewliyang_en_5.5.0_3.0_1727022270391.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_hewliyang_en_5.5.0_3.0_1727022270391.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_hewliyang","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_hewliyang", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_hewliyang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.9 MB| + +## References + +https://huggingface.co/hewliyang/whisper-tiny-minds14 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-withinapps_ndd_mrbs_test_tags_cwadj_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-withinapps_ndd_mrbs_test_tags_cwadj_pipeline_en.md new file mode 100644 index 00000000000000..92e1ab20aff7a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-withinapps_ndd_mrbs_test_tags_cwadj_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English withinapps_ndd_mrbs_test_tags_cwadj_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_mrbs_test_tags_cwadj_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_mrbs_test_tags_cwadj_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mrbs_test_tags_cwadj_pipeline_en_5.5.0_3.0_1727012247048.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mrbs_test_tags_cwadj_pipeline_en_5.5.0_3.0_1727012247048.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("withinapps_ndd_mrbs_test_tags_cwadj_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("withinapps_ndd_mrbs_test_tags_cwadj_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_mrbs_test_tags_cwadj_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-mrbs_test-tags-CWAdj + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_1_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_1_en.md new file mode 100644 index 00000000000000..8f60509684b7c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_1 XlmRoBertaForSequenceClassification from alyazharr +author: John Snow Labs +name: xlm_roberta_base_1 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_1` is a English model originally trained by alyazharr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_1_en_5.5.0_3.0_1727009665776.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_1_en_5.5.0_3.0_1727009665776.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|827.5 MB| + +## References + +https://huggingface.co/alyazharr/xlm_roberta_base_1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_1_pipeline_en.md new file mode 100644 index 00000000000000..aa559204ea09b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_1_pipeline pipeline XlmRoBertaForSequenceClassification from alyazharr +author: John Snow Labs +name: xlm_roberta_base_1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_1_pipeline` is a English model originally trained by alyazharr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_1_pipeline_en_5.5.0_3.0_1727009749083.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_1_pipeline_en_5.5.0_3.0_1727009749083.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.5 MB| + +## References + +https://huggingface.co/alyazharr/xlm_roberta_base_1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_balance_vietnam_aug_replace_w2v_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_balance_vietnam_aug_replace_w2v_en.md new file mode 100644 index 00000000000000..c580d1940bb1be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_balance_vietnam_aug_replace_w2v_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_balance_vietnam_aug_replace_w2v XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_balance_vietnam_aug_replace_w2v +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_balance_vietnam_aug_replace_w2v` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_replace_w2v_en_5.5.0_3.0_1727010095030.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_replace_w2v_en_5.5.0_3.0_1727010095030.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_balance_vietnam_aug_replace_w2v","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_balance_vietnam_aug_replace_w2v", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_balance_vietnam_aug_replace_w2v| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|795.9 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Balance_VietNam-aug_replace_w2v \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_balance_vietnam_aug_replace_w2v_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_balance_vietnam_aug_replace_w2v_pipeline_en.md new file mode 100644 index 00000000000000..bd7288323e4a13 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_balance_vietnam_aug_replace_w2v_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_balance_vietnam_aug_replace_w2v_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_balance_vietnam_aug_replace_w2v_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_balance_vietnam_aug_replace_w2v_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_replace_w2v_pipeline_en_5.5.0_3.0_1727010210731.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_vietnam_aug_replace_w2v_pipeline_en_5.5.0_3.0_1727010210731.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_balance_vietnam_aug_replace_w2v_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_balance_vietnam_aug_replace_w2v_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_balance_vietnam_aug_replace_w2v_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|796.0 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Balance_VietNam-aug_replace_w2v + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_all_mealduct_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_all_mealduct_en.md new file mode 100644 index 00000000000000..6329f595cfbb2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_all_mealduct_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_mealduct XlmRoBertaForTokenClassification from MealDuct +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_mealduct +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_mealduct` is a English model originally trained by MealDuct. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_mealduct_en_5.5.0_3.0_1727018613326.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_mealduct_en_5.5.0_3.0_1727018613326.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_mealduct","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_mealduct", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_mealduct| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/MealDuct/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_rupe_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_rupe_en.md new file mode 100644 index 00000000000000..aac2f07232ee5c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_rupe_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_rupe XlmRoBertaForTokenClassification from RupE +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_rupe +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_rupe` is a English model originally trained by RupE. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_rupe_en_5.5.0_3.0_1727018736421.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_rupe_en_5.5.0_3.0_1727018736421.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_rupe","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_rupe", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_rupe| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|833.0 MB| + +## References + +https://huggingface.co/RupE/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_yasu320001_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_yasu320001_en.md new file mode 100644 index 00000000000000..2e68e83a984b8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_yasu320001_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_yasu320001 XlmRoBertaForTokenClassification from yasu320001 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_yasu320001 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_yasu320001` is a English model originally trained by yasu320001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_yasu320001_en_5.5.0_3.0_1727018931165.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_yasu320001_en_5.5.0_3.0_1727018931165.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_yasu320001","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_yasu320001", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_yasu320001| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/yasu320001/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_yasu320001_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_yasu320001_pipeline_en.md new file mode 100644 index 00000000000000..1d5cdf15671df1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_french_yasu320001_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_yasu320001_pipeline pipeline XlmRoBertaForTokenClassification from yasu320001 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_yasu320001_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_yasu320001_pipeline` is a English model originally trained by yasu320001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_yasu320001_pipeline_en_5.5.0_3.0_1727019008995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_yasu320001_pipeline_en_5.5.0_3.0_1727019008995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_yasu320001_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_yasu320001_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_yasu320001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/yasu320001/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_amitjain171980_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_amitjain171980_en.md new file mode 100644 index 00000000000000..2126ec09681301 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_amitjain171980_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_amitjain171980 XlmRoBertaForTokenClassification from amitjain171980 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_amitjain171980 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_amitjain171980` is a English model originally trained by amitjain171980. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_amitjain171980_en_5.5.0_3.0_1727018470843.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_amitjain171980_en_5.5.0_3.0_1727018470843.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_amitjain171980","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_amitjain171980", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_amitjain171980| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/amitjain171980/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_amitjain171980_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_amitjain171980_pipeline_en.md new file mode 100644 index 00000000000000..94a1cfa8e51046 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_amitjain171980_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_amitjain171980_pipeline pipeline XlmRoBertaForTokenClassification from amitjain171980 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_amitjain171980_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_amitjain171980_pipeline` is a English model originally trained by amitjain171980. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_amitjain171980_pipeline_en_5.5.0_3.0_1727018532721.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_amitjain171980_pipeline_en_5.5.0_3.0_1727018532721.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_amitjain171980_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_amitjain171980_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_amitjain171980_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/amitjain171980/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_hcy5561_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_hcy5561_en.md new file mode 100644 index 00000000000000..75c6ae512e5cf5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_hcy5561_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_hcy5561 XlmRoBertaForTokenClassification from hcy5561 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_hcy5561 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_hcy5561` is a English model originally trained by hcy5561. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hcy5561_en_5.5.0_3.0_1727019238386.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hcy5561_en_5.5.0_3.0_1727019238386.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_hcy5561","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_hcy5561", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_hcy5561| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/hcy5561/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_xiao888_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_xiao888_en.md new file mode 100644 index 00000000000000..0a5805a8858149 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_xiao888_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_xiao888 XlmRoBertaForTokenClassification from Xiao888 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_xiao888 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_xiao888` is a English model originally trained by Xiao888. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_xiao888_en_5.5.0_3.0_1727018929609.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_xiao888_en_5.5.0_3.0_1727018929609.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_xiao888","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_xiao888", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_xiao888| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/Xiao888/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_xiao888_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_xiao888_pipeline_en.md new file mode 100644 index 00000000000000..679d84e26bb1ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_xiao888_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_xiao888_pipeline pipeline XlmRoBertaForTokenClassification from Xiao888 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_xiao888_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_xiao888_pipeline` is a English model originally trained by Xiao888. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_xiao888_pipeline_en_5.5.0_3.0_1727018995264.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_xiao888_pipeline_en_5.5.0_3.0_1727018995264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_xiao888_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_xiao888_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_xiao888_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/Xiao888/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_zeronin7_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_zeronin7_pipeline_en.md new file mode 100644 index 00000000000000..c726a4dd92a084 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_zeronin7_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_zeronin7_pipeline pipeline XlmRoBertaForTokenClassification from zeronin7 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_zeronin7_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_zeronin7_pipeline` is a English model originally trained by zeronin7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_zeronin7_pipeline_en_5.5.0_3.0_1726969923292.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_zeronin7_pipeline_en_5.5.0_3.0_1726969923292.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_zeronin7_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_zeronin7_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_zeronin7_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/zeronin7/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_sh_zheng_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_sh_zheng_en.md new file mode 100644 index 00000000000000..f21e66358b5605 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_sh_zheng_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_sh_zheng XlmRoBertaForTokenClassification from sh-zheng +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_sh_zheng +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_sh_zheng` is a English model originally trained by sh-zheng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_sh_zheng_en_5.5.0_3.0_1726970675630.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_sh_zheng_en_5.5.0_3.0_1726970675630.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_sh_zheng","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_sh_zheng", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_sh_zheng| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/sh-zheng/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_italian_yurit04_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_italian_yurit04_en.md new file mode 100644 index 00000000000000..19b4a4de5919b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_italian_yurit04_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_yurit04 XlmRoBertaForTokenClassification from yurit04 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_yurit04 +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_yurit04` is a English model originally trained by yurit04. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_yurit04_en_5.5.0_3.0_1727018491544.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_yurit04_en_5.5.0_3.0_1727018491544.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_yurit04","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_yurit04", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_yurit04| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/yurit04/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_italian_yurit04_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_italian_yurit04_pipeline_en.md new file mode 100644 index 00000000000000..799f120c715b24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_italian_yurit04_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_yurit04_pipeline pipeline XlmRoBertaForTokenClassification from yurit04 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_yurit04_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_yurit04_pipeline` is a English model originally trained by yurit04. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_yurit04_pipeline_en_5.5.0_3.0_1727018573800.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_yurit04_pipeline_en_5.5.0_3.0_1727018573800.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_yurit04_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_yurit04_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_yurit04_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/yurit04/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_korean_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_korean_en.md new file mode 100644 index 00000000000000..03f4c4ac9a2e39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_korean_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_korean XlmRoBertaForTokenClassification from sungkwangjoong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_korean +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_korean` is a English model originally trained by sungkwangjoong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_korean_en_5.5.0_3.0_1727019124460.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_korean_en_5.5.0_3.0_1727019124460.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_korean","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_korean", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_korean| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.7 MB| + +## References + +https://huggingface.co/sungkwangjoong/xlm-roberta-base-finetuned-panx-ko \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_korean_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_korean_pipeline_en.md new file mode 100644 index 00000000000000..28c9926d2c2324 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_korean_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_korean_pipeline pipeline XlmRoBertaForTokenClassification from sungkwangjoong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_korean_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_korean_pipeline` is a English model originally trained by sungkwangjoong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_korean_pipeline_en_5.5.0_3.0_1727019217304.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_korean_pipeline_en_5.5.0_3.0_1727019217304.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_korean_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_korean_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_korean_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.7 MB| + +## References + +https://huggingface.co/sungkwangjoong/xlm-roberta-base-finetuned-panx-ko + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_tamil_neelrr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_tamil_neelrr_pipeline_en.md new file mode 100644 index 00000000000000..aac02e2c1deeb8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_tamil_neelrr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_tamil_neelrr_pipeline pipeline XlmRoBertaForTokenClassification from neelrr +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_tamil_neelrr_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_tamil_neelrr_pipeline` is a English model originally trained by neelrr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_tamil_neelrr_pipeline_en_5.5.0_3.0_1727018576700.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_tamil_neelrr_pipeline_en_5.5.0_3.0_1727018576700.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_tamil_neelrr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_tamil_neelrr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_tamil_neelrr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|833.6 MB| + +## References + +https://huggingface.co/neelrr/xlm-roberta-base-finetuned-panx-ta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_pipeline_en.md new file mode 100644 index 00000000000000..4a8cb34e743de3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_pipeline_en_5.5.0_3.0_1727009935365.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_pipeline_en_5.5.0_3.0_1727009935365.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_nepal_bhasa_vietnam_aug_insert_synonym_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|795.7 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-New_VietNam-aug_insert_synonym-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_vietnam_aug_insert_w2v_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_vietnam_aug_insert_w2v_pipeline_en.md new file mode 100644 index 00000000000000..1b888e3faf7861 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_vietnam_aug_insert_w2v_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_vietnam_aug_insert_w2v_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_vietnam_aug_insert_w2v_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_vietnam_aug_insert_w2v_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vietnam_aug_insert_w2v_pipeline_en_5.5.0_3.0_1727009711975.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vietnam_aug_insert_w2v_pipeline_en_5.5.0_3.0_1727009711975.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_vietnam_aug_insert_w2v_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_vietnam_aug_insert_w2v_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_vietnam_aug_insert_w2v_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|795.0 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-VietNam-aug_insert_w2v + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlmr_english_chinese_all_shuffled_42_test1000_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlmr_english_chinese_all_shuffled_42_test1000_en.md new file mode 100644 index 00000000000000..f73263324f009a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlmr_english_chinese_all_shuffled_42_test1000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_english_chinese_all_shuffled_42_test1000 XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_english_chinese_all_shuffled_42_test1000 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_english_chinese_all_shuffled_42_test1000` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_english_chinese_all_shuffled_42_test1000_en_5.5.0_3.0_1727009254823.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_english_chinese_all_shuffled_42_test1000_en_5.5.0_3.0_1727009254823.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_english_chinese_all_shuffled_42_test1000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_english_chinese_all_shuffled_42_test1000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_english_chinese_all_shuffled_42_test1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|826.9 MB| + +## References + +https://huggingface.co/patpizio/xlmr-en-zh-all_shuffled-42-test1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-2020_q2_50p_filtered_2_en.md b/docs/_posts/ahmedlone127/2024-09-23-2020_q2_50p_filtered_2_en.md new file mode 100644 index 00000000000000..a789e6cf15b218 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-2020_q2_50p_filtered_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2020_q2_50p_filtered_2 RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q2_50p_filtered_2 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q2_50p_filtered_2` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q2_50p_filtered_2_en_5.5.0_3.0_1727091907536.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q2_50p_filtered_2_en_5.5.0_3.0_1727091907536.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("2020_q2_50p_filtered_2","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("2020_q2_50p_filtered_2","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q2_50p_filtered_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|463.9 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q2-50p-filtered_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-2020_q2_50p_filtered_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-2020_q2_50p_filtered_2_pipeline_en.md new file mode 100644 index 00000000000000..1054862eb3d8ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-2020_q2_50p_filtered_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2020_q2_50p_filtered_2_pipeline pipeline RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q2_50p_filtered_2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q2_50p_filtered_2_pipeline` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q2_50p_filtered_2_pipeline_en_5.5.0_3.0_1727091929671.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q2_50p_filtered_2_pipeline_en_5.5.0_3.0_1727091929671.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2020_q2_50p_filtered_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2020_q2_50p_filtered_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q2_50p_filtered_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.9 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q2-50p-filtered_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-2020_q2_75p_filtered_combined90_en.md b/docs/_posts/ahmedlone127/2024-09-23-2020_q2_75p_filtered_combined90_en.md new file mode 100644 index 00000000000000..2e882975ce74d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-2020_q2_75p_filtered_combined90_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2020_q2_75p_filtered_combined90 RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q2_75p_filtered_combined90 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q2_75p_filtered_combined90` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q2_75p_filtered_combined90_en_5.5.0_3.0_1727056871223.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q2_75p_filtered_combined90_en_5.5.0_3.0_1727056871223.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("2020_q2_75p_filtered_combined90","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("2020_q2_75p_filtered_combined90","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q2_75p_filtered_combined90| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q2-75p-filtered_combined90 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-2020_q2_75p_filtered_combined90_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-2020_q2_75p_filtered_combined90_pipeline_en.md new file mode 100644 index 00000000000000..8d5b384ed369de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-2020_q2_75p_filtered_combined90_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2020_q2_75p_filtered_combined90_pipeline pipeline RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q2_75p_filtered_combined90_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q2_75p_filtered_combined90_pipeline` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q2_75p_filtered_combined90_pipeline_en_5.5.0_3.0_1727056895264.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q2_75p_filtered_combined90_pipeline_en_5.5.0_3.0_1727056895264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2020_q2_75p_filtered_combined90_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2020_q2_75p_filtered_combined90_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q2_75p_filtered_combined90_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q2-75p-filtered_combined90 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-2020_q2_90p_filtered_combined90_en.md b/docs/_posts/ahmedlone127/2024-09-23-2020_q2_90p_filtered_combined90_en.md new file mode 100644 index 00000000000000..6e88d64fd4eb39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-2020_q2_90p_filtered_combined90_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2020_q2_90p_filtered_combined90 RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q2_90p_filtered_combined90 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q2_90p_filtered_combined90` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q2_90p_filtered_combined90_en_5.5.0_3.0_1727056740557.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q2_90p_filtered_combined90_en_5.5.0_3.0_1727056740557.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("2020_q2_90p_filtered_combined90","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("2020_q2_90p_filtered_combined90","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q2_90p_filtered_combined90| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q2-90p-filtered_combined90 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-3_roberta_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-3_roberta_0_pipeline_en.md new file mode 100644 index 00000000000000..4874d52c3883ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-3_roberta_0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 3_roberta_0_pipeline pipeline RoBertaForSequenceClassification from prl90777 +author: John Snow Labs +name: 3_roberta_0_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`3_roberta_0_pipeline` is a English model originally trained by prl90777. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/3_roberta_0_pipeline_en_5.5.0_3.0_1727055538073.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/3_roberta_0_pipeline_en_5.5.0_3.0_1727055538073.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("3_roberta_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("3_roberta_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|3_roberta_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|449.2 MB| + +## References + +https://huggingface.co/prl90777/3_roberta_0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-afriberta_small_hausa_5e_5_en.md b/docs/_posts/ahmedlone127/2024-09-23-afriberta_small_hausa_5e_5_en.md new file mode 100644 index 00000000000000..f9e6edf89e30d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-afriberta_small_hausa_5e_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English afriberta_small_hausa_5e_5 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afriberta_small_hausa_5e_5 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_small_hausa_5e_5` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_small_hausa_5e_5_en_5.5.0_3.0_1727061476768.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_small_hausa_5e_5_en_5.5.0_3.0_1727061476768.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_small_hausa_5e_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_small_hausa_5e_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afriberta_small_hausa_5e_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/grace-pro/afriberta-small-hausa-5e-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-afriberta_small_hausa_5e_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-afriberta_small_hausa_5e_5_pipeline_en.md new file mode 100644 index 00000000000000..d1838559ced798 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-afriberta_small_hausa_5e_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English afriberta_small_hausa_5e_5_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afriberta_small_hausa_5e_5_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_small_hausa_5e_5_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_small_hausa_5e_5_pipeline_en_5.5.0_3.0_1727061492242.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_small_hausa_5e_5_pipeline_en_5.5.0_3.0_1727061492242.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("afriberta_small_hausa_5e_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("afriberta_small_hausa_5e_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afriberta_small_hausa_5e_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/grace-pro/afriberta-small-hausa-5e-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-afro_xlmr_base_hausa_seed_30_en.md b/docs/_posts/ahmedlone127/2024-09-23-afro_xlmr_base_hausa_seed_30_en.md new file mode 100644 index 00000000000000..c214d9b94ae9ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-afro_xlmr_base_hausa_seed_30_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English afro_xlmr_base_hausa_seed_30 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afro_xlmr_base_hausa_seed_30 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afro_xlmr_base_hausa_seed_30` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_hausa_seed_30_en_5.5.0_3.0_1727061485807.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_hausa_seed_30_en_5.5.0_3.0_1727061485807.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afro_xlmr_base_hausa_seed_30","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afro_xlmr_base_hausa_seed_30", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afro_xlmr_base_hausa_seed_30| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/afro-xlmr-base-hausa-seed-30 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-afro_xlmr_base_hausa_seed_30_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-afro_xlmr_base_hausa_seed_30_pipeline_en.md new file mode 100644 index 00000000000000..56c5acb55020eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-afro_xlmr_base_hausa_seed_30_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English afro_xlmr_base_hausa_seed_30_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afro_xlmr_base_hausa_seed_30_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afro_xlmr_base_hausa_seed_30_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_hausa_seed_30_pipeline_en_5.5.0_3.0_1727061539034.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_hausa_seed_30_pipeline_en_5.5.0_3.0_1727061539034.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("afro_xlmr_base_hausa_seed_30_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("afro_xlmr_base_hausa_seed_30_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afro_xlmr_base_hausa_seed_30_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/afro-xlmr-base-hausa-seed-30 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ag_news_roberta_base_seed_2_en.md b/docs/_posts/ahmedlone127/2024-09-23-ag_news_roberta_base_seed_2_en.md new file mode 100644 index 00000000000000..28ad0db2ee29b7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ag_news_roberta_base_seed_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ag_news_roberta_base_seed_2 RoBertaForSequenceClassification from utahnlp +author: John Snow Labs +name: ag_news_roberta_base_seed_2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ag_news_roberta_base_seed_2` is a English model originally trained by utahnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ag_news_roberta_base_seed_2_en_5.5.0_3.0_1727055252287.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ag_news_roberta_base_seed_2_en_5.5.0_3.0_1727055252287.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("ag_news_roberta_base_seed_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("ag_news_roberta_base_seed_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ag_news_roberta_base_seed_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|463.3 MB| + +## References + +https://huggingface.co/utahnlp/ag_news_roberta-base_seed-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-agnews_padding30model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-agnews_padding30model_pipeline_en.md new file mode 100644 index 00000000000000..8e77c27bf23535 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-agnews_padding30model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English agnews_padding30model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: agnews_padding30model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`agnews_padding30model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/agnews_padding30model_pipeline_en_5.5.0_3.0_1727059686642.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/agnews_padding30model_pipeline_en_5.5.0_3.0_1727059686642.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("agnews_padding30model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("agnews_padding30model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|agnews_padding30model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/agnews_padding30model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ai_generated_text_classification_en.md b/docs/_posts/ahmedlone127/2024-09-23-ai_generated_text_classification_en.md new file mode 100644 index 00000000000000..8667c7292c650f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ai_generated_text_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ai_generated_text_classification RoBertaForSequenceClassification from luciayn +author: John Snow Labs +name: ai_generated_text_classification +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ai_generated_text_classification` is a English model originally trained by luciayn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ai_generated_text_classification_en_5.5.0_3.0_1727055227830.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ai_generated_text_classification_en_5.5.0_3.0_1727055227830.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("ai_generated_text_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("ai_generated_text_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ai_generated_text_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/luciayn/ai-generated-text-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ai_generated_text_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-ai_generated_text_classification_pipeline_en.md new file mode 100644 index 00000000000000..5dc5bf778c9a9c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ai_generated_text_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ai_generated_text_classification_pipeline pipeline RoBertaForSequenceClassification from luciayn +author: John Snow Labs +name: ai_generated_text_classification_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ai_generated_text_classification_pipeline` is a English model originally trained by luciayn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ai_generated_text_classification_pipeline_en_5.5.0_3.0_1727055250930.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ai_generated_text_classification_pipeline_en_5.5.0_3.0_1727055250930.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ai_generated_text_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ai_generated_text_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ai_generated_text_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/luciayn/ai-generated-text-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-all_roberta_large_v1_banking_10_16_5_en.md b/docs/_posts/ahmedlone127/2024-09-23-all_roberta_large_v1_banking_10_16_5_en.md new file mode 100644 index 00000000000000..1fe5a5bd96861b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-all_roberta_large_v1_banking_10_16_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_banking_10_16_5 RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_banking_10_16_5 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_banking_10_16_5` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_banking_10_16_5_en_5.5.0_3.0_1727055463492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_banking_10_16_5_en_5.5.0_3.0_1727055463492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_banking_10_16_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_banking_10_16_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_banking_10_16_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-banking-10-16-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-all_roberta_large_v1_banking_10_16_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-all_roberta_large_v1_banking_10_16_5_pipeline_en.md new file mode 100644 index 00000000000000..c8be765d729f0f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-all_roberta_large_v1_banking_10_16_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English all_roberta_large_v1_banking_10_16_5_pipeline pipeline RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_banking_10_16_5_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_banking_10_16_5_pipeline` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_banking_10_16_5_pipeline_en_5.5.0_3.0_1727055527422.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_banking_10_16_5_pipeline_en_5.5.0_3.0_1727055527422.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_roberta_large_v1_banking_10_16_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_roberta_large_v1_banking_10_16_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_banking_10_16_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-banking-10-16-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-amharicnewsnoncleanedsmall_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-amharicnewsnoncleanedsmall_pipeline_en.md new file mode 100644 index 00000000000000..efb835f5b8aef2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-amharicnewsnoncleanedsmall_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English amharicnewsnoncleanedsmall_pipeline pipeline XlmRoBertaForSequenceClassification from akiseid +author: John Snow Labs +name: amharicnewsnoncleanedsmall_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amharicnewsnoncleanedsmall_pipeline` is a English model originally trained by akiseid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amharicnewsnoncleanedsmall_pipeline_en_5.5.0_3.0_1727089188756.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amharicnewsnoncleanedsmall_pipeline_en_5.5.0_3.0_1727089188756.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("amharicnewsnoncleanedsmall_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("amharicnewsnoncleanedsmall_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amharicnewsnoncleanedsmall_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|839.3 MB| + +## References + +https://huggingface.co/akiseid/AmharicNewsNonCleanedSmall + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-augmented_model_fast_1_en.md b/docs/_posts/ahmedlone127/2024-09-23-augmented_model_fast_1_en.md new file mode 100644 index 00000000000000..6c470e97d59ce1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-augmented_model_fast_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English augmented_model_fast_1 DistilBertForSequenceClassification from LeonardoFettucciari +author: John Snow Labs +name: augmented_model_fast_1 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`augmented_model_fast_1` is a English model originally trained by LeonardoFettucciari. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/augmented_model_fast_1_en_5.5.0_3.0_1727073517341.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/augmented_model_fast_1_en_5.5.0_3.0_1727073517341.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("augmented_model_fast_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("augmented_model_fast_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|augmented_model_fast_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LeonardoFettucciari/augmented_model_fast_1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-augmented_model_one_en.md b/docs/_posts/ahmedlone127/2024-09-23-augmented_model_one_en.md new file mode 100644 index 00000000000000..01b35beaf50ce8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-augmented_model_one_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English augmented_model_one DistilBertForSequenceClassification from LeonardoFettucciari +author: John Snow Labs +name: augmented_model_one +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`augmented_model_one` is a English model originally trained by LeonardoFettucciari. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/augmented_model_one_en_5.5.0_3.0_1727087106801.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/augmented_model_one_en_5.5.0_3.0_1727087106801.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("augmented_model_one","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("augmented_model_one", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|augmented_model_one| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LeonardoFettucciari/augmented_model_one \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-babylm_roberta_base_epoch_15_en.md b/docs/_posts/ahmedlone127/2024-09-23-babylm_roberta_base_epoch_15_en.md new file mode 100644 index 00000000000000..b24933488b0d72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-babylm_roberta_base_epoch_15_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English babylm_roberta_base_epoch_15 RoBertaEmbeddings from Raj-Sanjay-Shah +author: John Snow Labs +name: babylm_roberta_base_epoch_15 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babylm_roberta_base_epoch_15` is a English model originally trained by Raj-Sanjay-Shah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babylm_roberta_base_epoch_15_en_5.5.0_3.0_1727066228126.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babylm_roberta_base_epoch_15_en_5.5.0_3.0_1727066228126.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("babylm_roberta_base_epoch_15","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("babylm_roberta_base_epoch_15","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babylm_roberta_base_epoch_15| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.7 MB| + +## References + +https://huggingface.co/Raj-Sanjay-Shah/babyLM_roberta_base_epoch_15 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-babylm_roberta_base_epoch_15_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-babylm_roberta_base_epoch_15_pipeline_en.md new file mode 100644 index 00000000000000..8fa7a53fa33581 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-babylm_roberta_base_epoch_15_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English babylm_roberta_base_epoch_15_pipeline pipeline RoBertaEmbeddings from Raj-Sanjay-Shah +author: John Snow Labs +name: babylm_roberta_base_epoch_15_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babylm_roberta_base_epoch_15_pipeline` is a English model originally trained by Raj-Sanjay-Shah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babylm_roberta_base_epoch_15_pipeline_en_5.5.0_3.0_1727066250168.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babylm_roberta_base_epoch_15_pipeline_en_5.5.0_3.0_1727066250168.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("babylm_roberta_base_epoch_15_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("babylm_roberta_base_epoch_15_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babylm_roberta_base_epoch_15_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.7 MB| + +## References + +https://huggingface.co/Raj-Sanjay-Shah/babyLM_roberta_base_epoch_15 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-base_12_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-23-base_12_pipeline_tr.md new file mode 100644 index 00000000000000..d0e3721ad9f3e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-base_12_pipeline_tr.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Turkish base_12_pipeline pipeline WhisperForCTC from Mehtap +author: John Snow Labs +name: base_12_pipeline +date: 2024-09-23 +tags: [tr, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_12_pipeline` is a Turkish model originally trained by Mehtap. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_12_pipeline_tr_5.5.0_3.0_1727077345335.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_12_pipeline_tr_5.5.0_3.0_1727077345335.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("base_12_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("base_12_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_12_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|643.7 MB| + +## References + +https://huggingface.co/Mehtap/base_12 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-base_12_tr.md b/docs/_posts/ahmedlone127/2024-09-23-base_12_tr.md new file mode 100644 index 00000000000000..fcd69771ba7eb6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-base_12_tr.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Turkish base_12 WhisperForCTC from Mehtap +author: John Snow Labs +name: base_12 +date: 2024-09-23 +tags: [tr, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_12` is a Turkish model originally trained by Mehtap. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_12_tr_5.5.0_3.0_1727077312654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_12_tr_5.5.0_3.0_1727077312654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("base_12","tr") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("base_12", "tr") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_12| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|tr| +|Size:|643.7 MB| + +## References + +https://huggingface.co/Mehtap/base_12 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_en.md b/docs/_posts/ahmedlone127/2024-09-23-base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_en.md new file mode 100644 index 00000000000000..1d6a2b9aaa7773 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2 WhisperForCTC from saahith +author: John Snow Labs +name: base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2 +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_en_5.5.0_3.0_1727052276622.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_en_5.5.0_3.0_1727052276622.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|646.3 MB| + +## References + +https://huggingface.co/saahith/base.en-combined_v5_wo_emsAssist-1-0.1-8-1e-05-tough-sweep-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_pipeline_en.md new file mode 100644 index 00000000000000..7a57450c272448 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_pipeline pipeline WhisperForCTC from saahith +author: John Snow Labs +name: base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_pipeline` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_pipeline_en_5.5.0_3.0_1727052312003.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_pipeline_en_5.5.0_3.0_1727052312003.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_english_combined_v5_wolof_emsassist_1_0_1_8_1e_05_tough_sweep_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|646.3 MB| + +## References + +https://huggingface.co/saahith/base.en-combined_v5_wo_emsAssist-1-0.1-8-1e-05-tough-sweep-2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bds_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bds_pipeline_en.md new file mode 100644 index 00000000000000..d303a0d01fa03d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bds_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bds_pipeline pipeline DistilBertForSequenceClassification from LogischeIP +author: John Snow Labs +name: bds_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bds_pipeline` is a English model originally trained by LogischeIP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bds_pipeline_en_5.5.0_3.0_1727087015908.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bds_pipeline_en_5.5.0_3.0_1727087015908.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bds_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bds_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bds_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LogischeIP/BDS + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_fake_news_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_fake_news_en.md new file mode 100644 index 00000000000000..23c8230305dc08 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_fake_news_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_cased_fake_news BertForSequenceClassification from elozano +author: John Snow Labs +name: bert_base_cased_fake_news +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_fake_news` is a English model originally trained by elozano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_fake_news_en_5.5.0_3.0_1727095289073.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_fake_news_en_5.5.0_3.0_1727095289073.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_fake_news","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_fake_news", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_fake_news| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/elozano/bert-base-cased-fake-news \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_squad2_finetuned_squad_chocolatehog_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_squad2_finetuned_squad_chocolatehog_en.md new file mode 100644 index 00000000000000..4f42644ef631eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_squad2_finetuned_squad_chocolatehog_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_cased_squad2_finetuned_squad_chocolatehog BertForQuestionAnswering from chocolatehog +author: John Snow Labs +name: bert_base_cased_squad2_finetuned_squad_chocolatehog +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_squad2_finetuned_squad_chocolatehog` is a English model originally trained by chocolatehog. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_squad2_finetuned_squad_chocolatehog_en_5.5.0_3.0_1727070310824.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_squad2_finetuned_squad_chocolatehog_en_5.5.0_3.0_1727070310824.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_squad2_finetuned_squad_chocolatehog","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_squad2_finetuned_squad_chocolatehog", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_squad2_finetuned_squad_chocolatehog| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/chocolatehog/bert-base-cased-squad2-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_squad2_finetuned_squad_chocolatehog_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_squad2_finetuned_squad_chocolatehog_pipeline_en.md new file mode 100644 index 00000000000000..13d62e516f043f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_squad2_finetuned_squad_chocolatehog_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_cased_squad2_finetuned_squad_chocolatehog_pipeline pipeline BertForQuestionAnswering from chocolatehog +author: John Snow Labs +name: bert_base_cased_squad2_finetuned_squad_chocolatehog_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_squad2_finetuned_squad_chocolatehog_pipeline` is a English model originally trained by chocolatehog. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_squad2_finetuned_squad_chocolatehog_pipeline_en_5.5.0_3.0_1727070331020.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_squad2_finetuned_squad_chocolatehog_pipeline_en_5.5.0_3.0_1727070331020.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_squad2_finetuned_squad_chocolatehog_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_squad2_finetuned_squad_chocolatehog_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_squad2_finetuned_squad_chocolatehog_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/chocolatehog/bert-base-cased-squad2-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_multilingual_cased_ner_hrl_nttaii_xx.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_multilingual_cased_ner_hrl_nttaii_xx.md new file mode 100644 index 00000000000000..24b6e53ab5f404 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_multilingual_cased_ner_hrl_nttaii_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_ner_hrl_nttaii BertForTokenClassification from nttaii +author: John Snow Labs +name: bert_base_multilingual_cased_ner_hrl_nttaii +date: 2024-09-23 +tags: [xx, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_ner_hrl_nttaii` is a Multilingual model originally trained by nttaii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_ner_hrl_nttaii_xx_5.5.0_3.0_1727060419777.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_ner_hrl_nttaii_xx_5.5.0_3.0_1727060419777.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_multilingual_cased_ner_hrl_nttaii","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_multilingual_cased_ner_hrl_nttaii", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_ner_hrl_nttaii| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|665.3 MB| + +## References + +https://huggingface.co/nttaii/bert-base-multilingual-cased-ner-hrl \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_spanish_wwm_cased_nubes_es.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_spanish_wwm_cased_nubes_es.md new file mode 100644 index 00000000000000..d056cf94dd6f40 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_spanish_wwm_cased_nubes_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bert_base_spanish_wwm_cased_nubes BertForTokenClassification from IIC +author: John Snow Labs +name: bert_base_spanish_wwm_cased_nubes +date: 2024-09-23 +tags: [es, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_cased_nubes` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_nubes_es_5.5.0_3.0_1727060208662.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_nubes_es_5.5.0_3.0_1727060208662.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_spanish_wwm_cased_nubes","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_spanish_wwm_cased_nubes", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_cased_nubes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|409.5 MB| + +## References + +https://huggingface.co/IIC/bert-base-spanish-wwm-cased-nubes \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_spanish_wwm_cased_nubes_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_spanish_wwm_cased_nubes_pipeline_es.md new file mode 100644 index 00000000000000..68a6502081d748 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_spanish_wwm_cased_nubes_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bert_base_spanish_wwm_cased_nubes_pipeline pipeline BertForTokenClassification from IIC +author: John Snow Labs +name: bert_base_spanish_wwm_cased_nubes_pipeline +date: 2024-09-23 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_cased_nubes_pipeline` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_nubes_pipeline_es_5.5.0_3.0_1727060228253.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_nubes_pipeline_es_5.5.0_3.0_1727060228253.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_spanish_wwm_cased_nubes_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_spanish_wwm_cased_nubes_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_cased_nubes_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|409.5 MB| + +## References + +https://huggingface.co/IIC/bert-base-spanish-wwm-cased-nubes + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncase_conll2012_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncase_conll2012_en.md new file mode 100644 index 00000000000000..ee070370850dba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncase_conll2012_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncase_conll2012 BertForTokenClassification from sarveshsk +author: John Snow Labs +name: bert_base_uncase_conll2012 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncase_conll2012` is a English model originally trained by sarveshsk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncase_conll2012_en_5.5.0_3.0_1727111304268.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncase_conll2012_en_5.5.0_3.0_1727111304268.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncase_conll2012","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncase_conll2012", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncase_conll2012| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.3 MB| + +## References + +https://huggingface.co/sarveshsk/bert_base_uncase_Conll2012 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_conll2003_joshuaphua_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_conll2003_joshuaphua_pipeline_en.md new file mode 100644 index 00000000000000..b94b9ed8b9c47b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_conll2003_joshuaphua_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_conll2003_joshuaphua_pipeline pipeline BertForTokenClassification from joshuaphua +author: John Snow Labs +name: bert_base_uncased_conll2003_joshuaphua_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_conll2003_joshuaphua_pipeline` is a English model originally trained by joshuaphua. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_conll2003_joshuaphua_pipeline_en_5.5.0_3.0_1727098274971.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_conll2003_joshuaphua_pipeline_en_5.5.0_3.0_1727098274971.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_conll2003_joshuaphua_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_conll2003_joshuaphua_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_conll2003_joshuaphua_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/joshuaphua/bert-base-uncased-conll2003 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_pipeline_en.md new file mode 100644 index 00000000000000..b6dbde2244a7a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_pipeline_en_5.5.0_3.0_1727050191426.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_pipeline_en_5.5.0_3.0_1727050191426.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_0_8_lr_1e_05_wd_0_001_dp_0_99999_swati_140000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-0.8-lr-1e-05-wd-0.001-dp-0.99999-ss-140000 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..af2c00e78d86de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727049992673.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727049992673.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_10_0_lr_4e_07_wd_1e_05_dp_1_0_swati_100_southern_sotho_false_fh_true_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-10.0-lr-4e-07-wd-1e-05-dp-1.0-ss-100-st-False-fh-True-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_en.md new file mode 100644 index 00000000000000..34e87f140b1dc3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_en_5.5.0_3.0_1727050260036.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900_en_5.5.0_3.0_1727050260036.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_900| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-05-wd-0.001-dp-0.99999-ss-900 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_en.md new file mode 100644 index 00000000000000..02b8b0d755c8ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_en_5.5.0_3.0_1727049749165.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_en_5.5.0_3.0_1727049749165.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-1e-05-wd-0.001-dp-0.2-ss-4664-st-False-fh-True-hs-666 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_pipeline_en.md new file mode 100644 index 00000000000000..08ac2880664dd0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_pipeline_en_5.5.0_3.0_1727049773921.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_pipeline_en_5.5.0_3.0_1727049773921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_1e_05_wd_0_001_dp_0_2_swati_4664_southern_sotho_false_fh_true_hs_666_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-1e-05-wd-0.001-dp-0.2-ss-4664-st-False-fh-True-hs-666 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_en.md new file mode 100644 index 00000000000000..b29ae22b84f298 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_en_5.5.0_3.0_1727050070238.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_en_5.5.0_3.0_1727050070238.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-1e-06-wd-0.001-dp-0.2-ss-2882-st-False-fh-True-hs-666 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_en.md new file mode 100644 index 00000000000000..f0aeed4f2ff1bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_en_5.5.0_3.0_1727049758353.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_en_5.5.0_3.0_1727049758353.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-8e-06-wd-0.001-dp-0.999 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetuned_mrpc_w05230505_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetuned_mrpc_w05230505_en.md new file mode 100644 index 00000000000000..8212611910e771 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetuned_mrpc_w05230505_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_mrpc_w05230505 BertForSequenceClassification from w05230505 +author: John Snow Labs +name: bert_base_uncased_finetuned_mrpc_w05230505 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_mrpc_w05230505` is a English model originally trained by w05230505. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_mrpc_w05230505_en_5.5.0_3.0_1727095230954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_mrpc_w05230505_en_5.5.0_3.0_1727095230954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_mrpc_w05230505","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_mrpc_w05230505", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_mrpc_w05230505| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/w05230505/bert-base-uncased-finetuned-mrpc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetuned_mrpc_w05230505_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetuned_mrpc_w05230505_pipeline_en.md new file mode 100644 index 00000000000000..2b8824ace9517e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetuned_mrpc_w05230505_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_mrpc_w05230505_pipeline pipeline BertForSequenceClassification from w05230505 +author: John Snow Labs +name: bert_base_uncased_finetuned_mrpc_w05230505_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_mrpc_w05230505_pipeline` is a English model originally trained by w05230505. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_mrpc_w05230505_pipeline_en_5.5.0_3.0_1727095250267.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_mrpc_w05230505_pipeline_en_5.5.0_3.0_1727095250267.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_mrpc_w05230505_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_mrpc_w05230505_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_mrpc_w05230505_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/w05230505/bert-base-uncased-finetuned-mrpc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_newscategoryclassification_fullmodel_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_newscategoryclassification_fullmodel_3_pipeline_en.md new file mode 100644 index 00000000000000..2f1c37b14f152e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_newscategoryclassification_fullmodel_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_newscategoryclassification_fullmodel_3_pipeline pipeline DistilBertForSequenceClassification from akashmaggon +author: John Snow Labs +name: bert_base_uncased_newscategoryclassification_fullmodel_3_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_newscategoryclassification_fullmodel_3_pipeline` is a English model originally trained by akashmaggon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_newscategoryclassification_fullmodel_3_pipeline_en_5.5.0_3.0_1727059869698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_newscategoryclassification_fullmodel_3_pipeline_en_5.5.0_3.0_1727059869698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_newscategoryclassification_fullmodel_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_newscategoryclassification_fullmodel_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_newscategoryclassification_fullmodel_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/akashmaggon/bert-base-uncased-newscategoryclassification-fullmodel-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_ner_asos_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_ner_asos_uncased_en.md new file mode 100644 index 00000000000000..4984779a7253ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_ner_asos_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_ner_asos_uncased BertForTokenClassification from vantagediscovery +author: John Snow Labs +name: bert_finetuned_ner_asos_uncased +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_asos_uncased` is a English model originally trained by vantagediscovery. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_asos_uncased_en_5.5.0_3.0_1727111828478.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_asos_uncased_en_5.5.0_3.0_1727111828478.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_asos_uncased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_asos_uncased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_asos_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.3 MB| + +## References + +https://huggingface.co/vantagediscovery/bert-finetuned-ner-asos-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_large_cased_scmedium_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_large_cased_scmedium_squad_pipeline_en.md new file mode 100644 index 00000000000000..f01c41498d2033 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_large_cased_scmedium_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_large_cased_scmedium_squad_pipeline pipeline BertForQuestionAnswering from CambridgeMolecularEngineering +author: John Snow Labs +name: bert_large_cased_scmedium_squad_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_cased_scmedium_squad_pipeline` is a English model originally trained by CambridgeMolecularEngineering. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_cased_scmedium_squad_pipeline_en_5.5.0_3.0_1727050127678.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_cased_scmedium_squad_pipeline_en_5.5.0_3.0_1727050127678.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_cased_scmedium_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_cased_scmedium_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_cased_scmedium_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/CambridgeMolecularEngineering/bert-large-cased-scmedium-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_finetuned_policy_number_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_finetuned_policy_number_pipeline_en.md new file mode 100644 index 00000000000000..5d1235a55c2f11 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_finetuned_policy_number_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_large_uncased_whole_word_masking_finetuned_policy_number_pipeline pipeline BertForQuestionAnswering from Ineract +author: John Snow Labs +name: bert_large_uncased_whole_word_masking_finetuned_policy_number_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_whole_word_masking_finetuned_policy_number_pipeline` is a English model originally trained by Ineract. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_finetuned_policy_number_pipeline_en_5.5.0_3.0_1727050028079.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_finetuned_policy_number_pipeline_en_5.5.0_3.0_1727050028079.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_uncased_whole_word_masking_finetuned_policy_number_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_uncased_whole_word_masking_finetuned_policy_number_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_whole_word_masking_finetuned_policy_number_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Ineract/bert-large-uncased-whole-word-masking-finetuned-policy-number + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_finetuned_squad_dev_one_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_finetuned_squad_dev_one_en.md new file mode 100644 index 00000000000000..a20cb1ec1cf7f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_finetuned_squad_dev_one_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_large_uncased_whole_word_masking_finetuned_squad_dev_one BertForQuestionAnswering from mdzrg +author: John Snow Labs +name: bert_large_uncased_whole_word_masking_finetuned_squad_dev_one +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_whole_word_masking_finetuned_squad_dev_one` is a English model originally trained by mdzrg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_finetuned_squad_dev_one_en_5.5.0_3.0_1727049678814.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_finetuned_squad_dev_one_en_5.5.0_3.0_1727049678814.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_large_uncased_whole_word_masking_finetuned_squad_dev_one","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_large_uncased_whole_word_masking_finetuned_squad_dev_one", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_whole_word_masking_finetuned_squad_dev_one| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/mdzrg/bert-large-uncased-whole-word-masking-finetuned-squad-dev-one \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_en.md new file mode 100644 index 00000000000000..902c194dd80e71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad BertForQuestionAnswering from haddadalwi +author: John Snow Labs +name: bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad` is a English model originally trained by haddadalwi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_en_5.5.0_3.0_1727050165039.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_en_5.5.0_3.0_1727050165039.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/haddadalwi/bert-large-uncased-whole-word-masking-squad2-finetuned-islamic-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_pipeline_en.md new file mode 100644 index 00000000000000..d36a8b8d8e962f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_pipeline pipeline BertForQuestionAnswering from haddadalwi +author: John Snow Labs +name: bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_pipeline` is a English model originally trained by haddadalwi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_pipeline_en_5.5.0_3.0_1727050228698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_pipeline_en_5.5.0_3.0_1727050228698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_whole_word_masking_squad2_finetuned_islamic_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/haddadalwi/bert-large-uncased-whole-word-masking-squad2-finetuned-islamic-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_poop_0_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_poop_0_en.md new file mode 100644 index 00000000000000..3b14ff42d2b47c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_poop_0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_poop_0 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_poop_0 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_poop_0` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_poop_0_en_5.5.0_3.0_1727082112064.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_poop_0_en_5.5.0_3.0_1727082112064.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_poop_0","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_poop_0", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_poop_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_poop_0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_tonga_tonga_islands_distilbert_ner_soniquentin_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_tonga_tonga_islands_distilbert_ner_soniquentin_en.md new file mode 100644 index 00000000000000..a353a4a232cc77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_tonga_tonga_islands_distilbert_ner_soniquentin_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_tonga_tonga_islands_distilbert_ner_soniquentin BertForTokenClassification from soniquentin +author: John Snow Labs +name: bert_tonga_tonga_islands_distilbert_ner_soniquentin +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tonga_tonga_islands_distilbert_ner_soniquentin` is a English model originally trained by soniquentin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tonga_tonga_islands_distilbert_ner_soniquentin_en_5.5.0_3.0_1727060743700.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tonga_tonga_islands_distilbert_ner_soniquentin_en_5.5.0_3.0_1727060743700.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_tonga_tonga_islands_distilbert_ner_soniquentin","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_tonga_tonga_islands_distilbert_ner_soniquentin", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tonga_tonga_islands_distilbert_ner_soniquentin| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|244.3 MB| + +## References + +https://huggingface.co/soniquentin/bert-to-distilbert-NER \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_tonga_tonga_islands_distilbert_ner_soniquentin_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_tonga_tonga_islands_distilbert_ner_soniquentin_pipeline_en.md new file mode 100644 index 00000000000000..932a137a117cc8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_tonga_tonga_islands_distilbert_ner_soniquentin_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_tonga_tonga_islands_distilbert_ner_soniquentin_pipeline pipeline BertForTokenClassification from soniquentin +author: John Snow Labs +name: bert_tonga_tonga_islands_distilbert_ner_soniquentin_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tonga_tonga_islands_distilbert_ner_soniquentin_pipeline` is a English model originally trained by soniquentin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tonga_tonga_islands_distilbert_ner_soniquentin_pipeline_en_5.5.0_3.0_1727060755254.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tonga_tonga_islands_distilbert_ner_soniquentin_pipeline_en_5.5.0_3.0_1727060755254.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_tonga_tonga_islands_distilbert_ner_soniquentin_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_tonga_tonga_islands_distilbert_ner_soniquentin_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tonga_tonga_islands_distilbert_ner_soniquentin_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|244.3 MB| + +## References + +https://huggingface.co/soniquentin/bert-to-distilbert-NER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bge_large_repmus_cross_entropy_en.md b/docs/_posts/ahmedlone127/2024-09-23-bge_large_repmus_cross_entropy_en.md new file mode 100644 index 00000000000000..910831292e9ebf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bge_large_repmus_cross_entropy_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_repmus_cross_entropy BGEEmbeddings from tessimago +author: John Snow Labs +name: bge_large_repmus_cross_entropy +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_repmus_cross_entropy` is a English model originally trained by tessimago. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_repmus_cross_entropy_en_5.5.0_3.0_1727105946568.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_repmus_cross_entropy_en_5.5.0_3.0_1727105946568.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_repmus_cross_entropy","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_repmus_cross_entropy","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_repmus_cross_entropy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/tessimago/bge-large-repmus-cross_entropy \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_carmen_livingner_humano_es.md b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_carmen_livingner_humano_es.md new file mode 100644 index 00000000000000..b0b12cfdba889e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_carmen_livingner_humano_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_carmen_livingner_humano RoBertaForTokenClassification from BSC-NLP4BIA +author: John Snow Labs +name: bsc_bio_ehr_spanish_carmen_livingner_humano +date: 2024-09-23 +tags: [es, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_carmen_livingner_humano` is a Castilian, Spanish model originally trained by BSC-NLP4BIA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_carmen_livingner_humano_es_5.5.0_3.0_1727081573115.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_carmen_livingner_humano_es_5.5.0_3.0_1727081573115.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_carmen_livingner_humano","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_carmen_livingner_humano", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_carmen_livingner_humano| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|454.4 MB| + +## References + +https://huggingface.co/BSC-NLP4BIA/bsc-bio-ehr-es-carmen-livingner-humano \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_carmen_livingner_humano_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_carmen_livingner_humano_pipeline_es.md new file mode 100644 index 00000000000000..9f5db6c67c9341 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_carmen_livingner_humano_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_carmen_livingner_humano_pipeline pipeline RoBertaForTokenClassification from BSC-NLP4BIA +author: John Snow Labs +name: bsc_bio_ehr_spanish_carmen_livingner_humano_pipeline +date: 2024-09-23 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_carmen_livingner_humano_pipeline` is a Castilian, Spanish model originally trained by BSC-NLP4BIA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_carmen_livingner_humano_pipeline_es_5.5.0_3.0_1727081596372.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_carmen_livingner_humano_pipeline_es_5.5.0_3.0_1727081596372.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bsc_bio_ehr_spanish_carmen_livingner_humano_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bsc_bio_ehr_spanish_carmen_livingner_humano_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_carmen_livingner_humano_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|454.4 MB| + +## References + +https://huggingface.co/BSC-NLP4BIA/bsc-bio-ehr-es-carmen-livingner-humano + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_distemist_ner_en.md b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_distemist_ner_en.md new file mode 100644 index 00000000000000..1f5b62c29fccc9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_distemist_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bsc_bio_ehr_spanish_distemist_ner RoBertaForTokenClassification from Rodrigo1771 +author: John Snow Labs +name: bsc_bio_ehr_spanish_distemist_ner +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_distemist_ner` is a English model originally trained by Rodrigo1771. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_distemist_ner_en_5.5.0_3.0_1727072856064.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_distemist_ner_en_5.5.0_3.0_1727072856064.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_distemist_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_distemist_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_distemist_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|440.4 MB| + +## References + +https://huggingface.co/Rodrigo1771/bsc-bio-ehr-es-distemist-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_distemist_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_distemist_ner_pipeline_en.md new file mode 100644 index 00000000000000..5c2e4c75cb5d29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_distemist_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bsc_bio_ehr_spanish_distemist_ner_pipeline pipeline RoBertaForTokenClassification from Rodrigo1771 +author: John Snow Labs +name: bsc_bio_ehr_spanish_distemist_ner_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_distemist_ner_pipeline` is a English model originally trained by Rodrigo1771. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_distemist_ner_pipeline_en_5.5.0_3.0_1727072887817.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_distemist_ner_pipeline_en_5.5.0_3.0_1727072887817.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bsc_bio_ehr_spanish_distemist_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bsc_bio_ehr_spanish_distemist_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_distemist_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|440.5 MB| + +## References + +https://huggingface.co/Rodrigo1771/bsc-bio-ehr-es-distemist-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_pipeline_en.md new file mode 100644 index 00000000000000..93a1f7749c6e1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_pipeline pipeline RoBertaForTokenClassification from Rodrigo1771 +author: John Snow Labs +name: bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_pipeline` is a English model originally trained by Rodrigo1771. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_pipeline_en_5.5.0_3.0_1727115210618.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_pipeline_en_5.5.0_3.0_1727115210618.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|435.0 MB| + +## References + +https://huggingface.co/Rodrigo1771/bsc-bio-ehr-es-symptemist-word2vec-85-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_vih_rod_en.md b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_vih_rod_en.md new file mode 100644 index 00000000000000..067925c0795831 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_vih_rod_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bsc_bio_ehr_spanish_vih_rod RoBertaForSequenceClassification from Wariano +author: John Snow Labs +name: bsc_bio_ehr_spanish_vih_rod +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_vih_rod` is a English model originally trained by Wariano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_vih_rod_en_5.5.0_3.0_1727055073441.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_vih_rod_en_5.5.0_3.0_1727055073441.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("bsc_bio_ehr_spanish_vih_rod","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("bsc_bio_ehr_spanish_vih_rod", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_vih_rod| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|437.0 MB| + +## References + +https://huggingface.co/Wariano/bsc-bio-ehr-es-vih-rod \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_vih_rod_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_vih_rod_pipeline_en.md new file mode 100644 index 00000000000000..5bc872f9c53e43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_vih_rod_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bsc_bio_ehr_spanish_vih_rod_pipeline pipeline RoBertaForSequenceClassification from Wariano +author: John Snow Labs +name: bsc_bio_ehr_spanish_vih_rod_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_vih_rod_pipeline` is a English model originally trained by Wariano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_vih_rod_pipeline_en_5.5.0_3.0_1727055107475.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_vih_rod_pipeline_en_5.5.0_3.0_1727055107475.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bsc_bio_ehr_spanish_vih_rod_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bsc_bio_ehr_spanish_vih_rod_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_vih_rod_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|437.0 MB| + +## References + +https://huggingface.co/Wariano/bsc-bio-ehr-es-vih-rod + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_abishines_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_abishines_en.md new file mode 100644 index 00000000000000..1c73ca809c86e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_abishines_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_abishines RoBertaEmbeddings from abishines +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_abishines +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_abishines` is a English model originally trained by abishines. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_abishines_en_5.5.0_3.0_1727091980181.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_abishines_en_5.5.0_3.0_1727091980181.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_abishines","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_abishines","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_abishines| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/abishines/my_awesome_eli5_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_eitanli_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_eitanli_en.md new file mode 100644 index 00000000000000..8df04212d275c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_eitanli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_eitanli RoBertaEmbeddings from Eitanli +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_eitanli +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_eitanli` is a English model originally trained by Eitanli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_eitanli_en_5.5.0_3.0_1727080650223.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_eitanli_en_5.5.0_3.0_1727080650223.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_eitanli","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_eitanli","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_eitanli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/Eitanli/my_awesome_eli5_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_waterboy111_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_waterboy111_en.md new file mode 100644 index 00000000000000..082699cce20e4a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_waterboy111_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_waterboy111 RoBertaEmbeddings from waterboy111 +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_waterboy111 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_waterboy111` is a English model originally trained by waterboy111. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_waterboy111_en_5.5.0_3.0_1727066058256.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_waterboy111_en_5.5.0_3.0_1727066058256.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_waterboy111","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("burmese_awesome_eli5_mlm_model_waterboy111","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_waterboy111| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/waterboy111/my_awesome_eli5_mlm_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_waterboy111_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_waterboy111_pipeline_en.md new file mode 100644 index 00000000000000..3ec5f7ced9a1dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_waterboy111_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_waterboy111_pipeline pipeline RoBertaEmbeddings from waterboy111 +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_waterboy111_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_waterboy111_pipeline` is a English model originally trained by waterboy111. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_waterboy111_pipeline_en_5.5.0_3.0_1727066074227.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_waterboy111_pipeline_en_5.5.0_3.0_1727066074227.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_waterboy111_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_waterboy111_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_waterboy111_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/waterboy111/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_bertester_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_bertester_en.md new file mode 100644 index 00000000000000..86b693130f716c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_bertester_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_bertester DistilBertForSequenceClassification from bertester +author: John Snow Labs +name: burmese_awesome_model_bertester +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_bertester` is a English model originally trained by bertester. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_bertester_en_5.5.0_3.0_1727059741813.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_bertester_en_5.5.0_3.0_1727059741813.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_bertester","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_bertester", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_bertester| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bertester/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_beto_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_beto_pipeline_en.md new file mode 100644 index 00000000000000..28624435288c5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_beto_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_beto_pipeline pipeline BertForSequenceClassification from maic1995 +author: John Snow Labs +name: burmese_awesome_model_beto_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_beto_pipeline` is a English model originally trained by maic1995. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_beto_pipeline_en_5.5.0_3.0_1727095446948.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_beto_pipeline_en_5.5.0_3.0_1727095446948.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_beto_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_beto_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_beto_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.9 MB| + +## References + +https://huggingface.co/maic1995/my_awesome_model_beto + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_brianrigoni_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_brianrigoni_pipeline_en.md new file mode 100644 index 00000000000000..6f90e4cc730e5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_brianrigoni_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_brianrigoni_pipeline pipeline DistilBertForSequenceClassification from brianrigoni +author: John Snow Labs +name: burmese_awesome_model_brianrigoni_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_brianrigoni_pipeline` is a English model originally trained by brianrigoni. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_brianrigoni_pipeline_en_5.5.0_3.0_1727097066695.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_brianrigoni_pipeline_en_5.5.0_3.0_1727097066695.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_brianrigoni_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_brianrigoni_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_brianrigoni_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/brianrigoni/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_eyeonyou_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_eyeonyou_en.md new file mode 100644 index 00000000000000..fae0c78e7ffedd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_eyeonyou_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_eyeonyou DistilBertForSequenceClassification from eyeonyou +author: John Snow Labs +name: burmese_awesome_model_eyeonyou +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_eyeonyou` is a English model originally trained by eyeonyou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_eyeonyou_en_5.5.0_3.0_1727110501859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_eyeonyou_en_5.5.0_3.0_1727110501859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_eyeonyou","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_eyeonyou", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_eyeonyou| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/eyeonyou/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_habiba_2227_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_habiba_2227_en.md new file mode 100644 index 00000000000000..8e87f7333419ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_habiba_2227_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_habiba_2227 DistilBertForSequenceClassification from habiba-2227 +author: John Snow Labs +name: burmese_awesome_model_habiba_2227 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_habiba_2227` is a English model originally trained by habiba-2227. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_habiba_2227_en_5.5.0_3.0_1727059385052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_habiba_2227_en_5.5.0_3.0_1727059385052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_habiba_2227","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_habiba_2227", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_habiba_2227| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/habiba-2227/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_jbar646_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_jbar646_en.md new file mode 100644 index 00000000000000..677258c25f981f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_jbar646_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_jbar646 DistilBertForSequenceClassification from jbar646 +author: John Snow Labs +name: burmese_awesome_model_jbar646 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_jbar646` is a English model originally trained by jbar646. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_jbar646_en_5.5.0_3.0_1727059238431.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_jbar646_en_5.5.0_3.0_1727059238431.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_jbar646","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_jbar646", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_jbar646| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jbar646/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_julianorosco37_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_julianorosco37_en.md new file mode 100644 index 00000000000000..ee551c49925ae6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_julianorosco37_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_julianorosco37 DistilBertForSequenceClassification from Julianorosco37 +author: John Snow Labs +name: burmese_awesome_model_julianorosco37 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_julianorosco37` is a English model originally trained by Julianorosco37. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_julianorosco37_en_5.5.0_3.0_1727082352214.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_julianorosco37_en_5.5.0_3.0_1727082352214.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_julianorosco37","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_julianorosco37", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_julianorosco37| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Julianorosco37/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_kelvinleong_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_kelvinleong_en.md new file mode 100644 index 00000000000000..69acfa74ee9e1d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_kelvinleong_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_kelvinleong RoBertaForSequenceClassification from kelvinleong +author: John Snow Labs +name: burmese_awesome_model_kelvinleong +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_kelvinleong` is a English model originally trained by kelvinleong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_kelvinleong_en_5.5.0_3.0_1727055362644.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_kelvinleong_en_5.5.0_3.0_1727055362644.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("burmese_awesome_model_kelvinleong","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("burmese_awesome_model_kelvinleong", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_kelvinleong| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|424.1 MB| + +## References + +https://huggingface.co/kelvinleong/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_kelvinleong_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_kelvinleong_pipeline_en.md new file mode 100644 index 00000000000000..2989130641af6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_kelvinleong_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_kelvinleong_pipeline pipeline RoBertaForSequenceClassification from kelvinleong +author: John Snow Labs +name: burmese_awesome_model_kelvinleong_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_kelvinleong_pipeline` is a English model originally trained by kelvinleong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_kelvinleong_pipeline_en_5.5.0_3.0_1727055391924.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_kelvinleong_pipeline_en_5.5.0_3.0_1727055391924.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_kelvinleong_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_kelvinleong_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_kelvinleong_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|424.1 MB| + +## References + +https://huggingface.co/kelvinleong/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_nandini54_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_nandini54_en.md new file mode 100644 index 00000000000000..09f68966a30f55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_nandini54_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_nandini54 DistilBertForSequenceClassification from Nandini54 +author: John Snow Labs +name: burmese_awesome_model_nandini54 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_nandini54` is a English model originally trained by Nandini54. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_nandini54_en_5.5.0_3.0_1727108287838.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_nandini54_en_5.5.0_3.0_1727108287838.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_nandini54","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_nandini54", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_nandini54| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Nandini54/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_omertnks_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_omertnks_pipeline_en.md new file mode 100644 index 00000000000000..360cbc46b51951 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_omertnks_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_omertnks_pipeline pipeline DistilBertForSequenceClassification from omertnks +author: John Snow Labs +name: burmese_awesome_model_omertnks_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_omertnks_pipeline` is a English model originally trained by omertnks. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_omertnks_pipeline_en_5.5.0_3.0_1727110612838.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_omertnks_pipeline_en_5.5.0_3.0_1727110612838.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_omertnks_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_omertnks_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_omertnks_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/omertnks/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_qa_model_dennischan_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_qa_model_dennischan_en.md new file mode 100644 index 00000000000000..8c520d4c715502 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_qa_model_dennischan_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_awesome_qa_model_dennischan BertForQuestionAnswering from dennischan +author: John Snow Labs +name: burmese_awesome_qa_model_dennischan +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_dennischan` is a English model originally trained by dennischan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_dennischan_en_5.5.0_3.0_1727049931660.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_dennischan_en_5.5.0_3.0_1727049931660.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("burmese_awesome_qa_model_dennischan","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("burmese_awesome_qa_model_dennischan", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_dennischan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/dennischan/my_awesome_qa_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_wnut_model_efar98_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_wnut_model_efar98_en.md new file mode 100644 index 00000000000000..a4eaed158e01bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_wnut_model_efar98_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_wnut_model_efar98 DistilBertForTokenClassification from Efar98 +author: John Snow Labs +name: burmese_awesome_wnut_model_efar98 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_model_efar98` is a English model originally trained by Efar98. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_efar98_en_5.5.0_3.0_1727065440434.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_efar98_en_5.5.0_3.0_1727065440434.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("burmese_awesome_wnut_model_efar98","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("burmese_awesome_wnut_model_efar98", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_model_efar98| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Efar98/my_awesome_wnut_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_textclassification_model_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_textclassification_model_en.md new file mode 100644 index 00000000000000..d50d77324a8550 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_textclassification_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_textclassification_model DistilBertForSequenceClassification from Happpy0413 +author: John Snow Labs +name: burmese_textclassification_model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_textclassification_model` is a English model originally trained by Happpy0413. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_textclassification_model_en_5.5.0_3.0_1727059119275.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_textclassification_model_en_5.5.0_3.0_1727059119275.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_textclassification_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_textclassification_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_textclassification_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Happpy0413/my_textclassification_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-canbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-canbert_pipeline_en.md new file mode 100644 index 00000000000000..12688705bc5f19 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-canbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English canbert_pipeline pipeline RoBertaEmbeddings from ebelenwaf +author: John Snow Labs +name: canbert_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`canbert_pipeline` is a English model originally trained by ebelenwaf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/canbert_pipeline_en_5.5.0_3.0_1727056597928.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/canbert_pipeline_en_5.5.0_3.0_1727056597928.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("canbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("canbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|canbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.5 MB| + +## References + +https://huggingface.co/ebelenwaf/canbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-cebfil_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-23-cebfil_roberta_en.md new file mode 100644 index 00000000000000..63e25b982053f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-cebfil_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cebfil_roberta RoBertaEmbeddings from jfernandez +author: John Snow Labs +name: cebfil_roberta +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cebfil_roberta` is a English model originally trained by jfernandez. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cebfil_roberta_en_5.5.0_3.0_1727057002605.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cebfil_roberta_en_5.5.0_3.0_1727057002605.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("cebfil_roberta","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("cebfil_roberta","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cebfil_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|469.5 MB| + +## References + +https://huggingface.co/jfernandez/cebfil-roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-classifier_main_subjects_technology_en.md b/docs/_posts/ahmedlone127/2024-09-23-classifier_main_subjects_technology_en.md new file mode 100644 index 00000000000000..14ca07d64c927e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-classifier_main_subjects_technology_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English classifier_main_subjects_technology RoBertaForSequenceClassification from gptmurdock +author: John Snow Labs +name: classifier_main_subjects_technology +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classifier_main_subjects_technology` is a English model originally trained by gptmurdock. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classifier_main_subjects_technology_en_5.5.0_3.0_1727086265650.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classifier_main_subjects_technology_en_5.5.0_3.0_1727086265650.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("classifier_main_subjects_technology","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("classifier_main_subjects_technology", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classifier_main_subjects_technology| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|463.9 MB| + +## References + +https://huggingface.co/gptmurdock/classifier-main_subjects_technology \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-classifier_main_subjects_technology_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-classifier_main_subjects_technology_pipeline_en.md new file mode 100644 index 00000000000000..b1cde9d34b567c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-classifier_main_subjects_technology_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English classifier_main_subjects_technology_pipeline pipeline RoBertaForSequenceClassification from gptmurdock +author: John Snow Labs +name: classifier_main_subjects_technology_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classifier_main_subjects_technology_pipeline` is a English model originally trained by gptmurdock. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classifier_main_subjects_technology_pipeline_en_5.5.0_3.0_1727086289605.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classifier_main_subjects_technology_pipeline_en_5.5.0_3.0_1727086289605.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("classifier_main_subjects_technology_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("classifier_main_subjects_technology_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classifier_main_subjects_technology_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.9 MB| + +## References + +https://huggingface.co/gptmurdock/classifier-main_subjects_technology + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-classifier_tensorride_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-classifier_tensorride_pipeline_en.md new file mode 100644 index 00000000000000..a6050c67292e9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-classifier_tensorride_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English classifier_tensorride_pipeline pipeline DistilBertForSequenceClassification from Tensorride +author: John Snow Labs +name: classifier_tensorride_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classifier_tensorride_pipeline` is a English model originally trained by Tensorride. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classifier_tensorride_pipeline_en_5.5.0_3.0_1727059256095.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classifier_tensorride_pipeline_en_5.5.0_3.0_1727059256095.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("classifier_tensorride_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("classifier_tensorride_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classifier_tensorride_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Tensorride/Classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-climate_change_en.md b/docs/_posts/ahmedlone127/2024-09-23-climate_change_en.md new file mode 100644 index 00000000000000..addff67c1f1d61 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-climate_change_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English climate_change RoBertaEmbeddings from DhirajKumarSahu +author: John Snow Labs +name: climate_change +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`climate_change` is a English model originally trained by DhirajKumarSahu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/climate_change_en_5.5.0_3.0_1727066208595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/climate_change_en_5.5.0_3.0_1727066208595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("climate_change","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("climate_change","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|climate_change| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|384.4 MB| + +## References + +https://huggingface.co/DhirajKumarSahu/climate_change \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-climate_change_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-climate_change_pipeline_en.md new file mode 100644 index 00000000000000..71df9b8ada85b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-climate_change_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English climate_change_pipeline pipeline RoBertaEmbeddings from DhirajKumarSahu +author: John Snow Labs +name: climate_change_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`climate_change_pipeline` is a English model originally trained by DhirajKumarSahu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/climate_change_pipeline_en_5.5.0_3.0_1727066259462.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/climate_change_pipeline_en_5.5.0_3.0_1727066259462.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("climate_change_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("climate_change_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|climate_change_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|384.4 MB| + +## References + +https://huggingface.co/DhirajKumarSahu/climate_change + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-code_search_codebert_base_2_en.md b/docs/_posts/ahmedlone127/2024-09-23-code_search_codebert_base_2_en.md new file mode 100644 index 00000000000000..7c19af396317dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-code_search_codebert_base_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English code_search_codebert_base_2 RoBertaForTokenClassification from DianaIulia +author: John Snow Labs +name: code_search_codebert_base_2 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`code_search_codebert_base_2` is a English model originally trained by DianaIulia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_2_en_5.5.0_3.0_1727081601856.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_2_en_5.5.0_3.0_1727081601856.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("code_search_codebert_base_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("code_search_codebert_base_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|code_search_codebert_base_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DianaIulia/code_search_codebert_base_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr28_seed3_en.md b/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr28_seed3_en.md new file mode 100644 index 00000000000000..1cfaf7c47fe12e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr28_seed3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cold_fusion_itr28_seed3 RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr28_seed3 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr28_seed3` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr28_seed3_en_5.5.0_3.0_1727055385938.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr28_seed3_en_5.5.0_3.0_1727055385938.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr28_seed3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr28_seed3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr28_seed3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.0 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr28-seed3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr28_seed3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr28_seed3_pipeline_en.md new file mode 100644 index 00000000000000..fa5aae6b0e1301 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr28_seed3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cold_fusion_itr28_seed3_pipeline pipeline RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr28_seed3_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr28_seed3_pipeline` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr28_seed3_pipeline_en_5.5.0_3.0_1727055408875.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr28_seed3_pipeline_en_5.5.0_3.0_1727055408875.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cold_fusion_itr28_seed3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cold_fusion_itr28_seed3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr28_seed3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.0 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr28-seed3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-comnumpndistilbertv1_big_en.md b/docs/_posts/ahmedlone127/2024-09-23-comnumpndistilbertv1_big_en.md new file mode 100644 index 00000000000000..74b8a1a28a5d6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-comnumpndistilbertv1_big_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English comnumpndistilbertv1_big DistilBertForSequenceClassification from abbassix +author: John Snow Labs +name: comnumpndistilbertv1_big +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`comnumpndistilbertv1_big` is a English model originally trained by abbassix. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/comnumpndistilbertv1_big_en_5.5.0_3.0_1727086751840.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/comnumpndistilbertv1_big_en_5.5.0_3.0_1727086751840.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("comnumpndistilbertv1_big","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("comnumpndistilbertv1_big", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|comnumpndistilbertv1_big| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|250.4 MB| + +## References + +https://huggingface.co/abbassix/ComNumPNdistilBERTv1-big \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_pipeline_en.md new file mode 100644 index 00000000000000..5bae7ca608497d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_pipeline pipeline BertForTokenClassification from ali2066 +author: John Snow Labs +name: correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_pipeline` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_pipeline_en_5.5.0_3.0_1727111290012.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_pipeline_en_5.5.0_3.0_1727111290012.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|correct_bert_token_itr0_0_0001_all_01_03_2022_15_52_19_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ali2066/correct_BERT_token_itr0_0.0001_all_01_03_2022-15_52_19 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-cos_tapt_n_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-23-cos_tapt_n_roberta_en.md new file mode 100644 index 00000000000000..df8b7044394f48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-cos_tapt_n_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cos_tapt_n_roberta RoBertaEmbeddings from Kyleiwaniec +author: John Snow Labs +name: cos_tapt_n_roberta +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cos_tapt_n_roberta` is a English model originally trained by Kyleiwaniec. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cos_tapt_n_roberta_en_5.5.0_3.0_1727066055984.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cos_tapt_n_roberta_en_5.5.0_3.0_1727066055984.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("cos_tapt_n_roberta","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("cos_tapt_n_roberta","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cos_tapt_n_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Kyleiwaniec/COS_TAPT_n_RoBERTa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-cos_tapt_n_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-cos_tapt_n_roberta_pipeline_en.md new file mode 100644 index 00000000000000..a0572a045cf995 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-cos_tapt_n_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cos_tapt_n_roberta_pipeline pipeline RoBertaEmbeddings from Kyleiwaniec +author: John Snow Labs +name: cos_tapt_n_roberta_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cos_tapt_n_roberta_pipeline` is a English model originally trained by Kyleiwaniec. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cos_tapt_n_roberta_pipeline_en_5.5.0_3.0_1727066118449.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cos_tapt_n_roberta_pipeline_en_5.5.0_3.0_1727066118449.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cos_tapt_n_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cos_tapt_n_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cos_tapt_n_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Kyleiwaniec/COS_TAPT_n_RoBERTa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-covid_roberta_25_en.md b/docs/_posts/ahmedlone127/2024-09-23-covid_roberta_25_en.md new file mode 100644 index 00000000000000..3799d4c69bc877 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-covid_roberta_25_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English covid_roberta_25 RoBertaEmbeddings from timoneda +author: John Snow Labs +name: covid_roberta_25 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`covid_roberta_25` is a English model originally trained by timoneda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/covid_roberta_25_en_5.5.0_3.0_1727092053022.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/covid_roberta_25_en_5.5.0_3.0_1727092053022.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("covid_roberta_25","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("covid_roberta_25","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|covid_roberta_25| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/timoneda/covid_roberta_25 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-covid_roberta_25_masked_en.md b/docs/_posts/ahmedlone127/2024-09-23-covid_roberta_25_masked_en.md new file mode 100644 index 00000000000000..6b96b87a528677 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-covid_roberta_25_masked_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English covid_roberta_25_masked RoBertaEmbeddings from timoneda +author: John Snow Labs +name: covid_roberta_25_masked +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`covid_roberta_25_masked` is a English model originally trained by timoneda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/covid_roberta_25_masked_en_5.5.0_3.0_1727091906525.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/covid_roberta_25_masked_en_5.5.0_3.0_1727091906525.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("covid_roberta_25_masked","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("covid_roberta_25_masked","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|covid_roberta_25_masked| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/timoneda/covid_roberta_25_masked \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-covid_roberta_25_masked_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-covid_roberta_25_masked_pipeline_en.md new file mode 100644 index 00000000000000..dfab35fe39540c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-covid_roberta_25_masked_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English covid_roberta_25_masked_pipeline pipeline RoBertaEmbeddings from timoneda +author: John Snow Labs +name: covid_roberta_25_masked_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`covid_roberta_25_masked_pipeline` is a English model originally trained by timoneda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/covid_roberta_25_masked_pipeline_en_5.5.0_3.0_1727091972081.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/covid_roberta_25_masked_pipeline_en_5.5.0_3.0_1727091972081.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("covid_roberta_25_masked_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("covid_roberta_25_masked_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|covid_roberta_25_masked_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/timoneda/covid_roberta_25_masked + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-db_mc_9_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-db_mc_9_2_pipeline_en.md new file mode 100644 index 00000000000000..ed4896bcbd2067 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-db_mc_9_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English db_mc_9_2_pipeline pipeline DistilBertForSequenceClassification from exala +author: John Snow Labs +name: db_mc_9_2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`db_mc_9_2_pipeline` is a English model originally trained by exala. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/db_mc_9_2_pipeline_en_5.5.0_3.0_1727059236326.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/db_mc_9_2_pipeline_en_5.5.0_3.0_1727059236326.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("db_mc_9_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("db_mc_9_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|db_mc_9_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/exala/db_mc_9.2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-detect_femicide_news_xlmr_dutch_mono_freeze2_en.md b/docs/_posts/ahmedlone127/2024-09-23-detect_femicide_news_xlmr_dutch_mono_freeze2_en.md new file mode 100644 index 00000000000000..bab99a1a9a137d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-detect_femicide_news_xlmr_dutch_mono_freeze2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English detect_femicide_news_xlmr_dutch_mono_freeze2 XlmRoBertaForSequenceClassification from gossminn +author: John Snow Labs +name: detect_femicide_news_xlmr_dutch_mono_freeze2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`detect_femicide_news_xlmr_dutch_mono_freeze2` is a English model originally trained by gossminn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/detect_femicide_news_xlmr_dutch_mono_freeze2_en_5.5.0_3.0_1727100177395.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/detect_femicide_news_xlmr_dutch_mono_freeze2_en_5.5.0_3.0_1727100177395.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("detect_femicide_news_xlmr_dutch_mono_freeze2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("detect_femicide_news_xlmr_dutch_mono_freeze2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|detect_femicide_news_xlmr_dutch_mono_freeze2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|655.1 MB| + +## References + +https://huggingface.co/gossminn/detect-femicide-news-xlmr-nl-mono-freeze2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-detect_femicide_news_xlmr_dutch_mono_freeze2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-detect_femicide_news_xlmr_dutch_mono_freeze2_pipeline_en.md new file mode 100644 index 00000000000000..e8ef678fc96e47 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-detect_femicide_news_xlmr_dutch_mono_freeze2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English detect_femicide_news_xlmr_dutch_mono_freeze2_pipeline pipeline XlmRoBertaForSequenceClassification from gossminn +author: John Snow Labs +name: detect_femicide_news_xlmr_dutch_mono_freeze2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`detect_femicide_news_xlmr_dutch_mono_freeze2_pipeline` is a English model originally trained by gossminn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/detect_femicide_news_xlmr_dutch_mono_freeze2_pipeline_en_5.5.0_3.0_1727100356459.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/detect_femicide_news_xlmr_dutch_mono_freeze2_pipeline_en_5.5.0_3.0_1727100356459.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("detect_femicide_news_xlmr_dutch_mono_freeze2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("detect_femicide_news_xlmr_dutch_mono_freeze2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|detect_femicide_news_xlmr_dutch_mono_freeze2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|655.2 MB| + +## References + +https://huggingface.co/gossminn/detect-femicide-news-xlmr-nl-mono-freeze2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-detector_god2_en.md b/docs/_posts/ahmedlone127/2024-09-23-detector_god2_en.md new file mode 100644 index 00000000000000..73d4b2345bdecb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-detector_god2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English detector_god2 XlmRoBertaForSequenceClassification from Sydelabs +author: John Snow Labs +name: detector_god2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`detector_god2` is a English model originally trained by Sydelabs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/detector_god2_en_5.5.0_3.0_1727088261823.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/detector_god2_en_5.5.0_3.0_1727088261823.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("detector_god2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("detector_god2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|detector_god2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.1 GB| + +## References + +https://huggingface.co/Sydelabs/detector_god2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-diffusion_robustness_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-23-diffusion_robustness_imdb_en.md new file mode 100644 index 00000000000000..1c61e0dd548e20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-diffusion_robustness_imdb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English diffusion_robustness_imdb RoBertaEmbeddings from Maybe1407 +author: John Snow Labs +name: diffusion_robustness_imdb +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`diffusion_robustness_imdb` is a English model originally trained by Maybe1407. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/diffusion_robustness_imdb_en_5.5.0_3.0_1727080458962.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/diffusion_robustness_imdb_en_5.5.0_3.0_1727080458962.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("diffusion_robustness_imdb","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("diffusion_robustness_imdb","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|diffusion_robustness_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Maybe1407/diffusion_robustness_imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-diffusion_robustness_imdb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-diffusion_robustness_imdb_pipeline_en.md new file mode 100644 index 00000000000000..8fb6844f33a202 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-diffusion_robustness_imdb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English diffusion_robustness_imdb_pipeline pipeline RoBertaEmbeddings from Maybe1407 +author: John Snow Labs +name: diffusion_robustness_imdb_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`diffusion_robustness_imdb_pipeline` is a English model originally trained by Maybe1407. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/diffusion_robustness_imdb_pipeline_en_5.5.0_3.0_1727080527683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/diffusion_robustness_imdb_pipeline_en_5.5.0_3.0_1727080527683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("diffusion_robustness_imdb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("diffusion_robustness_imdb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|diffusion_robustness_imdb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Maybe1407/diffusion_robustness_imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_agnews_padding50model_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_agnews_padding50model_en.md new file mode 100644 index 00000000000000..c78a7d0c649608 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_agnews_padding50model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_agnews_padding50model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_agnews_padding50model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_agnews_padding50model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding50model_en_5.5.0_3.0_1727087106129.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding50model_en_5.5.0_3.0_1727087106129.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_agnews_padding50model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_agnews_padding50model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_agnews_padding50model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_agnews_padding50model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_agnews_padding80model_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_agnews_padding80model_en.md new file mode 100644 index 00000000000000..a293c4b11191dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_agnews_padding80model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_agnews_padding80model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_agnews_padding80model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_agnews_padding80model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding80model_en_5.5.0_3.0_1727059834158.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding80model_en_5.5.0_3.0_1727059834158.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_agnews_padding80model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_agnews_padding80model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_agnews_padding80model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_agnews_padding80model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_en.md new file mode 100644 index 00000000000000..d4128211615834 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi DistilBertForSequenceClassification from Sohaibsoussi +author: John Snow Labs +name: distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi` is a English model originally trained by Sohaibsoussi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_en_5.5.0_3.0_1727087216585.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_en_5.5.0_3.0_1727087216585.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/Sohaibsoussi/distilbert-base-uncased-distilled-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_emotion_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_emotion_classifier_en.md new file mode 100644 index 00000000000000..3c63c5ffbedf3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_emotion_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_emotion_classifier DistilBertForSequenceClassification from automatichamster +author: John Snow Labs +name: distilbert_base_uncased_emotion_classifier +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_emotion_classifier` is a English model originally trained by automatichamster. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_classifier_en_5.5.0_3.0_1727087130717.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_classifier_en_5.5.0_3.0_1727087130717.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_emotion_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_emotion_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_emotion_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/automatichamster/distilbert-base-uncased-emotion-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_emotion_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_emotion_classifier_pipeline_en.md new file mode 100644 index 00000000000000..8269e9b8595b5a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_emotion_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_emotion_classifier_pipeline pipeline DistilBertForSequenceClassification from automatichamster +author: John Snow Labs +name: distilbert_base_uncased_emotion_classifier_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_emotion_classifier_pipeline` is a English model originally trained by automatichamster. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_classifier_pipeline_en_5.5.0_3.0_1727087143250.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_classifier_pipeline_en_5.5.0_3.0_1727087143250.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_emotion_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_emotion_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_emotion_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/automatichamster/distilbert-base-uncased-emotion-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_adl_hw1_alkhwarizmi_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_adl_hw1_alkhwarizmi_en.md new file mode 100644 index 00000000000000..9520ffb1c5b741 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_adl_hw1_alkhwarizmi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_adl_hw1_alkhwarizmi DistilBertForSequenceClassification from alkhwarizmi +author: John Snow Labs +name: distilbert_base_uncased_finetuned_adl_hw1_alkhwarizmi +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_adl_hw1_alkhwarizmi` is a English model originally trained by alkhwarizmi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw1_alkhwarizmi_en_5.5.0_3.0_1727108287535.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw1_alkhwarizmi_en_5.5.0_3.0_1727108287535.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_adl_hw1_alkhwarizmi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_adl_hw1_alkhwarizmi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_adl_hw1_alkhwarizmi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/alkhwarizmi/distilbert-base-uncased-finetuned-adl_hw1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_adl_hw_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_adl_hw_en.md new file mode 100644 index 00000000000000..112ebfda99951b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_adl_hw_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_adl_hw DistilBertForSequenceClassification from Hongu +author: John Snow Labs +name: distilbert_base_uncased_finetuned_adl_hw +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_adl_hw` is a English model originally trained by Hongu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw_en_5.5.0_3.0_1727074043952.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_adl_hw_en_5.5.0_3.0_1727074043952.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_adl_hw","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_adl_hw", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_adl_hw| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/Hongu/distilbert-base-uncased-finetuned-adl_hw \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_bobtk_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_bobtk_en.md new file mode 100644 index 00000000000000..a2edc00b815b17 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_bobtk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_bobtk DistilBertForSequenceClassification from bobtk +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_bobtk +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_bobtk` is a English model originally trained by bobtk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_bobtk_en_5.5.0_3.0_1727093709989.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_bobtk_en_5.5.0_3.0_1727093709989.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_bobtk","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_bobtk", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_bobtk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/bobtk/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_cjfghk5697_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_cjfghk5697_pipeline_en.md new file mode 100644 index 00000000000000..b03d5bae586cdd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_cjfghk5697_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_cjfghk5697_pipeline pipeline DistilBertForSequenceClassification from cjfghk5697 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_cjfghk5697_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_cjfghk5697_pipeline` is a English model originally trained by cjfghk5697. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_cjfghk5697_pipeline_en_5.5.0_3.0_1727059239264.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_cjfghk5697_pipeline_en_5.5.0_3.0_1727059239264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_cjfghk5697_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_cjfghk5697_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_cjfghk5697_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/cjfghk5697/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_dodiaz2111_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_dodiaz2111_en.md new file mode 100644 index 00000000000000..0277273df754e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_dodiaz2111_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_dodiaz2111 DistilBertForSequenceClassification from dodiaz2111 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_dodiaz2111 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_dodiaz2111` is a English model originally trained by dodiaz2111. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_dodiaz2111_en_5.5.0_3.0_1727074031811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_dodiaz2111_en_5.5.0_3.0_1727074031811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_dodiaz2111","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_dodiaz2111", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_dodiaz2111| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/dodiaz2111/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_addie11_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_addie11_en.md new file mode 100644 index 00000000000000..14054395aca005 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_addie11_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_addie11 DistilBertForSequenceClassification from addie11 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_addie11 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_addie11` is a English model originally trained by addie11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_addie11_en_5.5.0_3.0_1727059343971.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_addie11_en_5.5.0_3.0_1727059343971.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_addie11","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_addie11", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_addie11| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/addie11/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_addie11_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_addie11_pipeline_en.md new file mode 100644 index 00000000000000..21c514eed983b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_addie11_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_addie11_pipeline pipeline DistilBertForSequenceClassification from addie11 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_addie11_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_addie11_pipeline` is a English model originally trained by addie11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_addie11_pipeline_en_5.5.0_3.0_1727059355826.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_addie11_pipeline_en_5.5.0_3.0_1727059355826.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_addie11_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_addie11_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_addie11_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/addie11/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_gamallo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_gamallo_pipeline_en.md new file mode 100644 index 00000000000000..99a1733c346c86 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_gamallo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_gamallo_pipeline pipeline DistilBertForSequenceClassification from gamallo +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_gamallo_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_gamallo_pipeline` is a English model originally trained by gamallo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_gamallo_pipeline_en_5.5.0_3.0_1727059387763.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_gamallo_pipeline_en_5.5.0_3.0_1727059387763.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_gamallo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_gamallo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_gamallo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gamallo/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_adidae_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_adidae_en.md new file mode 100644 index 00000000000000..77078fba799dc5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_adidae_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_adidae DistilBertForSequenceClassification from Adidae +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_adidae +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_adidae` is a English model originally trained by Adidae. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_adidae_en_5.5.0_3.0_1727059564967.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_adidae_en_5.5.0_3.0_1727059564967.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_adidae","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_adidae", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_adidae| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Adidae/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_adidae_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_adidae_pipeline_en.md new file mode 100644 index 00000000000000..aa0ae06c624c43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_adidae_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_adidae_pipeline pipeline DistilBertForSequenceClassification from Adidae +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_adidae_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_adidae_pipeline` is a English model originally trained by Adidae. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_adidae_pipeline_en_5.5.0_3.0_1727059577102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_adidae_pipeline_en_5.5.0_3.0_1727059577102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_adidae_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_adidae_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_adidae_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Adidae/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_cereline_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_cereline_pipeline_en.md new file mode 100644 index 00000000000000..902a4299ea2b4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_cereline_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_cereline_pipeline pipeline DistilBertForSequenceClassification from cereline +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_cereline_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_cereline_pipeline` is a English model originally trained by cereline. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cereline_pipeline_en_5.5.0_3.0_1727110622417.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cereline_pipeline_en_5.5.0_3.0_1727110622417.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_cereline_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_cereline_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_cereline_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cereline/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_cogsci13_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_cogsci13_en.md new file mode 100644 index 00000000000000..85ff8714d9bc15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_cogsci13_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_cogsci13 DistilBertForSequenceClassification from cogsci13 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_cogsci13 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_cogsci13` is a English model originally trained by cogsci13. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cogsci13_en_5.5.0_3.0_1727082509540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cogsci13_en_5.5.0_3.0_1727082509540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_cogsci13","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_cogsci13", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_cogsci13| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cogsci13/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_devs0n_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_devs0n_en.md new file mode 100644 index 00000000000000..161ab6b3dcef2c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_devs0n_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_devs0n DistilBertForSequenceClassification from devs0n +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_devs0n +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_devs0n` is a English model originally trained by devs0n. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_devs0n_en_5.5.0_3.0_1727059472829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_devs0n_en_5.5.0_3.0_1727059472829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_devs0n","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_devs0n", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_devs0n| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/devs0n/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_hun0520_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_hun0520_pipeline_en.md new file mode 100644 index 00000000000000..50034e86d97a3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_hun0520_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_hun0520_pipeline pipeline DistilBertForSequenceClassification from hun0520 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_hun0520_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_hun0520_pipeline` is a English model originally trained by hun0520. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hun0520_pipeline_en_5.5.0_3.0_1727093944930.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_hun0520_pipeline_en_5.5.0_3.0_1727093944930.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_hun0520_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_hun0520_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_hun0520_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hun0520/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jerilseb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jerilseb_pipeline_en.md new file mode 100644 index 00000000000000..9d97827efa3b46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jerilseb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jerilseb_pipeline pipeline DistilBertForSequenceClassification from jerilseb +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jerilseb_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jerilseb_pipeline` is a English model originally trained by jerilseb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jerilseb_pipeline_en_5.5.0_3.0_1727097310875.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jerilseb_pipeline_en_5.5.0_3.0_1727097310875.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jerilseb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jerilseb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jerilseb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jerilseb/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_lulu5131_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_lulu5131_en.md new file mode 100644 index 00000000000000..f48af0f66bbd03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_lulu5131_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_lulu5131 DistilBertForSequenceClassification from lulu5131 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_lulu5131 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_lulu5131` is a English model originally trained by lulu5131. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_lulu5131_en_5.5.0_3.0_1727073842892.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_lulu5131_en_5.5.0_3.0_1727073842892.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_lulu5131","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_lulu5131", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_lulu5131| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lulu5131/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_lulu5131_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_lulu5131_pipeline_en.md new file mode 100644 index 00000000000000..024488c948fe9a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_lulu5131_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_lulu5131_pipeline pipeline DistilBertForSequenceClassification from lulu5131 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_lulu5131_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_lulu5131_pipeline` is a English model originally trained by lulu5131. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_lulu5131_pipeline_en_5.5.0_3.0_1727073854903.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_lulu5131_pipeline_en_5.5.0_3.0_1727073854903.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_lulu5131_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_lulu5131_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_lulu5131_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lulu5131/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_matvey67_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_matvey67_en.md new file mode 100644 index 00000000000000..be2aeb90c16188 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_matvey67_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_matvey67 DistilBertForSequenceClassification from Matvey67 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_matvey67 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_matvey67` is a English model originally trained by Matvey67. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_matvey67_en_5.5.0_3.0_1727073944867.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_matvey67_en_5.5.0_3.0_1727073944867.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_matvey67","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_matvey67", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_matvey67| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Matvey67/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_praneelnihar_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_praneelnihar_en.md new file mode 100644 index 00000000000000..4028fbd34bbe20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_praneelnihar_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_praneelnihar DistilBertForSequenceClassification from PraneelNihar +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_praneelnihar +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_praneelnihar` is a English model originally trained by PraneelNihar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_praneelnihar_en_5.5.0_3.0_1727093694045.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_praneelnihar_en_5.5.0_3.0_1727093694045.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_praneelnihar","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_praneelnihar", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_praneelnihar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/PraneelNihar/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_praneelnihar_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_praneelnihar_pipeline_en.md new file mode 100644 index 00000000000000..2a60a814694861 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_praneelnihar_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_praneelnihar_pipeline pipeline DistilBertForSequenceClassification from PraneelNihar +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_praneelnihar_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_praneelnihar_pipeline` is a English model originally trained by PraneelNihar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_praneelnihar_pipeline_en_5.5.0_3.0_1727093707231.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_praneelnihar_pipeline_en_5.5.0_3.0_1727093707231.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_praneelnihar_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_praneelnihar_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_praneelnihar_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/PraneelNihar/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_zcstarr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_zcstarr_pipeline_en.md new file mode 100644 index 00000000000000..d8938a70d1be66 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_zcstarr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_zcstarr_pipeline pipeline DistilBertForSequenceClassification from zcstarr +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_zcstarr_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_zcstarr_pipeline` is a English model originally trained by zcstarr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_zcstarr_pipeline_en_5.5.0_3.0_1727059337813.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_zcstarr_pipeline_en_5.5.0_3.0_1727059337813.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_zcstarr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_zcstarr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_zcstarr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/zcstarr/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_ziwone_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_ziwone_en.md new file mode 100644 index 00000000000000..0de483a5d61836 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_ziwone_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ziwone DistilBertForSequenceClassification from ziwone +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ziwone +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ziwone` is a English model originally trained by ziwone. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ziwone_en_5.5.0_3.0_1727074057508.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ziwone_en_5.5.0_3.0_1727074057508.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ziwone","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ziwone", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ziwone| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ziwone/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_intro2_verizon_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_intro2_verizon_en.md new file mode 100644 index 00000000000000..2a627b3ecc85eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_intro2_verizon_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_intro2_verizon DistilBertForSequenceClassification from TieIncred +author: John Snow Labs +name: distilbert_base_uncased_finetuned_intro2_verizon +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_intro2_verizon` is a English model originally trained by TieIncred. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_intro2_verizon_en_5.5.0_3.0_1727086974505.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_intro2_verizon_en_5.5.0_3.0_1727086974505.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_intro2_verizon","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_intro2_verizon", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_intro2_verizon| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/TieIncred/distilbert-base-uncased-finetuned-intro2-verizon \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_pad_clf_v2_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_pad_clf_v2_en.md new file mode 100644 index 00000000000000..f00a8e90023d12 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_pad_clf_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_pad_clf_v2 DistilBertForSequenceClassification from netoferraz +author: John Snow Labs +name: distilbert_base_uncased_finetuned_pad_clf_v2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_pad_clf_v2` is a English model originally trained by netoferraz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_pad_clf_v2_en_5.5.0_3.0_1727110387504.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_pad_clf_v2_en_5.5.0_3.0_1727110387504.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_pad_clf_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_pad_clf_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_pad_clf_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/netoferraz/distilbert-base-uncased-finetuned-pad-clf-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_pipeline_en.md new file mode 100644 index 00000000000000..d9c58d8bf047a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_pipeline pipeline DistilBertForSequenceClassification from Beijaflor2024 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_pipeline` is a English model originally trained by Beijaflor2024. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_pipeline_en_5.5.0_3.0_1727093998673.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_pipeline_en_5.5.0_3.0_1727093998673.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_sst_2_english_beijaflor2024_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Beijaflor2024/distilbert-base-uncased-finetuned-sst-2-english + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_tweets_dataset_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_tweets_dataset_pipeline_en.md new file mode 100644 index 00000000000000..603f749d1ddbe7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_tweets_dataset_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_tweets_dataset_pipeline pipeline DistilBertForSequenceClassification from lambda101 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_tweets_dataset_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_tweets_dataset_pipeline` is a English model originally trained by lambda101. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_tweets_dataset_pipeline_en_5.5.0_3.0_1727086767715.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_tweets_dataset_pipeline_en_5.5.0_3.0_1727086767715.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_tweets_dataset_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_tweets_dataset_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_tweets_dataset_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lambda101/distilbert-base-uncased-finetuned-tweets-dataset + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finteuned_emotion_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finteuned_emotion_pipeline_en.md new file mode 100644 index 00000000000000..fe6bc3e3f253a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finteuned_emotion_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finteuned_emotion_pipeline pipeline DistilBertForSequenceClassification from sknera +author: John Snow Labs +name: distilbert_base_uncased_finteuned_emotion_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finteuned_emotion_pipeline` is a English model originally trained by sknera. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finteuned_emotion_pipeline_en_5.5.0_3.0_1727074074068.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finteuned_emotion_pipeline_en_5.5.0_3.0_1727074074068.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finteuned_emotion_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finteuned_emotion_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finteuned_emotion_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sknera/distilbert-base-uncased-finteuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_imdb_dfurman_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_imdb_dfurman_en.md new file mode 100644 index 00000000000000..4a40a9455edc21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_imdb_dfurman_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_imdb_dfurman DistilBertForSequenceClassification from dfurman +author: John Snow Labs +name: distilbert_base_uncased_imdb_dfurman +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_imdb_dfurman` is a English model originally trained by dfurman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_imdb_dfurman_en_5.5.0_3.0_1727086925352.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_imdb_dfurman_en_5.5.0_3.0_1727086925352.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_imdb_dfurman","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_imdb_dfurman", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_imdb_dfurman| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dfurman/distilbert-base-uncased-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_lora_text_classification_lincgr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_lora_text_classification_lincgr_pipeline_en.md new file mode 100644 index 00000000000000..7c68cc18b70681 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_lora_text_classification_lincgr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_lora_text_classification_lincgr_pipeline pipeline DistilBertForSequenceClassification from lincgr +author: John Snow Labs +name: distilbert_base_uncased_lora_text_classification_lincgr_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_lora_text_classification_lincgr_pipeline` is a English model originally trained by lincgr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_lora_text_classification_lincgr_pipeline_en_5.5.0_3.0_1727059569407.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_lora_text_classification_lincgr_pipeline_en_5.5.0_3.0_1727059569407.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_lora_text_classification_lincgr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_lora_text_classification_lincgr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_lora_text_classification_lincgr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lincgr/distilbert-base-uncased-lora-text-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_en.md new file mode 100644 index 00000000000000..6d86c76286e517 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_en_5.5.0_3.0_1727059409187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_en_5.5.0_3.0_1727059409187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st13sd_ut72ut5_PLPrefix0stlarge13_simsp100_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_pipeline_en.md new file mode 100644 index 00000000000000..1229d9b810808b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_pipeline_en_5.5.0_3.0_1727059420921.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_pipeline_en_5.5.0_3.0_1727059420921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge13_simsp100_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st13sd_ut72ut5_PLPrefix0stlarge13_simsp100_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_en.md new file mode 100644 index 00000000000000..37801adef19cd9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_en_5.5.0_3.0_1727059122032.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_en_5.5.0_3.0_1727059122032.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st14sd_ut72ut1large14PfxNf_simsp400_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_pipeline_en.md new file mode 100644 index 00000000000000..079fa026aaa050 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_pipeline_en_5.5.0_3.0_1727059139627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_pipeline_en_5.5.0_3.0_1727059139627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st14sd_ut72ut1large14pfxnf_simsp400_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st14sd_ut72ut1large14PfxNf_simsp400_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_en.md new file mode 100644 index 00000000000000..534d19c5bc6ac4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_en_5.5.0_3.0_1727059527931.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_en_5.5.0_3.0_1727059527931.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st15sd_ut72ut1_PLPrefix0stlarge_simsp300_clean100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en.md new file mode 100644 index 00000000000000..46f4b7b970d174 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en_5.5.0_3.0_1727059540372.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline_en_5.5.0_3.0_1727059540372.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st15sd_ut72ut1_plprefix0stlarge_simsp300_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st15sd_ut72ut1_PLPrefix0stlarge_simsp300_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_en.md new file mode 100644 index 00000000000000..dc6b692c0cf884 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_en_5.5.0_3.0_1727110613987.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_en_5.5.0_3.0_1727110613987.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st16sd_ut72ut3_PLPrefix0stlarge_simsp100_clean100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge_simsp300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge_simsp300_pipeline_en.md new file mode 100644 index 00000000000000..bfa9f4a030e145 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge_simsp300_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge_simsp300_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge_simsp300_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge_simsp300_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge_simsp300_pipeline_en_5.5.0_3.0_1727108509148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge_simsp300_pipeline_en_5.5.0_3.0_1727108509148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge_simsp300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge_simsp300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge_simsp300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st19sd_ut72ut1_PLPrefix0stlarge_simsp300 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_en.md new file mode 100644 index 00000000000000..e5b3f1adf3f15a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_en_5.5.0_3.0_1727073953066.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_en_5.5.0_3.0_1727073953066.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st2sd_ut72ut5_PLPrefix0stlarge_simsp100_clean300 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en.md new file mode 100644 index 00000000000000..e8f91cdf824809 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en_5.5.0_3.0_1727073969499.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en_5.5.0_3.0_1727073969499.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st2sd_ut72ut5_PLPrefix0stlarge_simsp100_clean300 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_en.md new file mode 100644 index 00000000000000..613b125070f3f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_en_5.5.0_3.0_1727108642673.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp_en_5.5.0_3.0_1727108642673.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge30_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1_PLPrefix0stlarge30_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean1sd_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean1sd_en.md new file mode 100644 index 00000000000000..fad029a3c29576 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean1sd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean1sd DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean1sd +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean1sd` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean1sd_en_5.5.0_3.0_1727082444097.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean1sd_en_5.5.0_3.0_1727082444097.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean1sd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean1sd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean1sd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut5_PLPrefix0stlarge42_simsp_clean1sd \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_en.md new file mode 100644 index 00000000000000..56638b25cd5e01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_en_5.5.0_3.0_1727093823964.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp_en_5.5.0_3.0_1727093823964.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut92ut1_pl0stlarge42_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut92ut1_PL0stlarge42_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_pii_200_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_pii_200_en.md new file mode 100644 index 00000000000000..31df0615c8602b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_pii_200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_pii_200 DistilBertForTokenClassification from modeldev +author: John Snow Labs +name: distilbert_base_uncased_pii_200 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_pii_200` is a English model originally trained by modeldev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_pii_200_en_5.5.0_3.0_1727065444115.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_pii_200_en_5.5.0_3.0_1727065444115.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_base_uncased_pii_200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_base_uncased_pii_200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_pii_200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.6 MB| + +## References + +https://huggingface.co/modeldev/distilbert-base-uncased-pii-200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_pii_200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_pii_200_pipeline_en.md new file mode 100644 index 00000000000000..a246f1dbf69086 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_pii_200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_pii_200_pipeline pipeline DistilBertForTokenClassification from modeldev +author: John Snow Labs +name: distilbert_base_uncased_pii_200_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_pii_200_pipeline` is a English model originally trained by modeldev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_pii_200_pipeline_en_5.5.0_3.0_1727065455623.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_pii_200_pipeline_en_5.5.0_3.0_1727065455623.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_pii_200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_pii_200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_pii_200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.6 MB| + +## References + +https://huggingface.co/modeldev/distilbert-base-uncased-pii-200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_resumesclasssifierv1_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_resumesclasssifierv1_en.md new file mode 100644 index 00000000000000..eca1f141a206ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_resumesclasssifierv1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_resumesclasssifierv1 DistilBertForSequenceClassification from youssefkhalil320 +author: John Snow Labs +name: distilbert_base_uncased_resumesclasssifierv1 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_resumesclasssifierv1` is a English model originally trained by youssefkhalil320. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_resumesclasssifierv1_en_5.5.0_3.0_1727073670686.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_resumesclasssifierv1_en_5.5.0_3.0_1727073670686.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_resumesclasssifierv1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_resumesclasssifierv1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_resumesclasssifierv1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/youssefkhalil320/distilbert-base-uncased-resumesClasssifierV1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_sancho3010_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_sancho3010_en.md new file mode 100644 index 00000000000000..2a28e868b51b0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_sancho3010_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_sancho3010 DistilBertForSequenceClassification from Sancho3010 +author: John Snow Labs +name: distilbert_base_uncased_sancho3010 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_sancho3010` is a English model originally trained by Sancho3010. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_sancho3010_en_5.5.0_3.0_1727059119274.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_sancho3010_en_5.5.0_3.0_1727059119274.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_sancho3010","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_sancho3010", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_sancho3010| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Sancho3010/distilbert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_sbulut_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_sbulut_en.md new file mode 100644 index 00000000000000..ff667563419607 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_sbulut_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_sbulut DistilBertForSequenceClassification from sbulut +author: John Snow Labs +name: distilbert_base_uncased_sbulut +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_sbulut` is a English model originally trained by sbulut. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_sbulut_en_5.5.0_3.0_1727087221032.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_sbulut_en_5.5.0_3.0_1727087221032.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_sbulut","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_sbulut", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_sbulut| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sbulut/distilbert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_sbulut_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_sbulut_pipeline_en.md new file mode 100644 index 00000000000000..f4f4f5376e4768 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_sbulut_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_sbulut_pipeline pipeline DistilBertForSequenceClassification from sbulut +author: John Snow Labs +name: distilbert_base_uncased_sbulut_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_sbulut_pipeline` is a English model originally trained by sbulut. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_sbulut_pipeline_en_5.5.0_3.0_1727087232708.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_sbulut_pipeline_en_5.5.0_3.0_1727087232708.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_sbulut_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_sbulut_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_sbulut_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sbulut/distilbert-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_thienle_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_thienle_en.md new file mode 100644 index 00000000000000..b4fd2c981d99a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_thienle_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_thienle DistilBertForSequenceClassification from thienlelong +author: John Snow Labs +name: distilbert_base_uncased_thienle +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_thienle` is a English model originally trained by thienlelong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_thienle_en_5.5.0_3.0_1727059121707.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_thienle_en_5.5.0_3.0_1727059121707.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_thienle","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_thienle", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_thienle| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thienlelong/distilbert-base-uncased-thienle \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_thienle_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_thienle_pipeline_en.md new file mode 100644 index 00000000000000..c0bc73c4b92be4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_thienle_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_thienle_pipeline pipeline DistilBertForSequenceClassification from thienlelong +author: John Snow Labs +name: distilbert_base_uncased_thienle_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_thienle_pipeline` is a English model originally trained by thienlelong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_thienle_pipeline_en_5.5.0_3.0_1727059139704.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_thienle_pipeline_en_5.5.0_3.0_1727059139704.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_thienle_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_thienle_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_thienle_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thienlelong/distilbert-base-uncased-thienle + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_three_v2_fix_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_three_v2_fix_en.md new file mode 100644 index 00000000000000..d43f5ef99bc287 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_three_v2_fix_en.md @@ -0,0 +1,96 @@ +--- +layout: model +title: English distilbert_base_uncased_three_v2_fix DistilBertForTokenClassification from devtibo +author: John Snow Labs +name: distilbert_base_uncased_three_v2_fix +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_three_v2_fix` is a English model originally trained by devtibo. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_three_v2_fix_en_5.5.0_3.0_1727065438463.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_three_v2_fix_en_5.5.0_3.0_1727065438463.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_base_uncased_three_v2_fix","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("distilbert_base_uncased_three_v2_fix", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_three_v2_fix| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.6 MB| + +## References + +References + +https://huggingface.co/devtibo/distilbert-base-uncased-three-v2-fix \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_three_v2_fix_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_three_v2_fix_pipeline_en.md new file mode 100644 index 00000000000000..2fe798dcc6525d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_three_v2_fix_pipeline_en.md @@ -0,0 +1,72 @@ +--- +layout: model +title: English distilbert_base_uncased_three_v2_fix_pipeline pipeline DistilBertForTokenClassification from devtibo +author: John Snow Labs +name: distilbert_base_uncased_three_v2_fix_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_three_v2_fix_pipeline` is a English model originally trained by devtibo. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_three_v2_fix_pipeline_en_5.5.0_3.0_1727065450037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_three_v2_fix_pipeline_en_5.5.0_3.0_1727065450037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +pipeline = PretrainedPipeline("distilbert_base_uncased_three_v2_fix_pipeline", lang = "en") +annotations = pipeline.transform(df) +``` +```scala +val pipeline = new PretrainedPipeline("distilbert_base_uncased_three_v2_fix_pipeline", lang = "en") +val annotations = pipeline.transform(df) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_three_v2_fix_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.6 MB| + +## References + +References + +https://huggingface.co/devtibo/distilbert-base-uncased-three-v2-fix + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_en.md new file mode 100644 index 00000000000000..1ae3a90d759f25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_en_5.5.0_3.0_1727087339017.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp_en_5.5.0_3.0_1727087339017.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut52ut1_ad7_sp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut52ut1_ad7_sp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_en.md new file mode 100644 index 00000000000000..e58758ac956c24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_en_5.5.0_3.0_1727108566162.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp_en_5.5.0_3.0_1727108566162.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut72ut1_plainprefix_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut72ut1_plainPrefix_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_carlosramirez2112_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_carlosramirez2112_en.md new file mode 100644 index 00000000000000..d961fd0a782d63 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_carlosramirez2112_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotion_carlosramirez2112 DistilBertForSequenceClassification from carlosramirez2112 +author: John Snow Labs +name: distilbert_emotion_carlosramirez2112 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_carlosramirez2112` is a English model originally trained by carlosramirez2112. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_carlosramirez2112_en_5.5.0_3.0_1727086752254.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_carlosramirez2112_en_5.5.0_3.0_1727086752254.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_carlosramirez2112","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_carlosramirez2112", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_carlosramirez2112| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/carlosramirez2112/distilbert-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_gthivaios_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_gthivaios_en.md new file mode 100644 index 00000000000000..d9f7568c1842e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_gthivaios_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotion_gthivaios DistilBertForSequenceClassification from gthivaios +author: John Snow Labs +name: distilbert_emotion_gthivaios +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_gthivaios` is a English model originally trained by gthivaios. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_gthivaios_en_5.5.0_3.0_1727087099729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_gthivaios_en_5.5.0_3.0_1727087099729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_gthivaios","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_gthivaios", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_gthivaios| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gthivaios/distilbert-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_scaaseu_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_scaaseu_en.md new file mode 100644 index 00000000000000..bb18d1353cb61c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_scaaseu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotion_scaaseu DistilBertForSequenceClassification from scaaseu +author: John Snow Labs +name: distilbert_emotion_scaaseu +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_scaaseu` is a English model originally trained by scaaseu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_scaaseu_en_5.5.0_3.0_1727073517763.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_scaaseu_en_5.5.0_3.0_1727073517763.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_scaaseu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_emotion_scaaseu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_scaaseu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/scaaseu/distilbert-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_scaaseu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_scaaseu_pipeline_en.md new file mode 100644 index 00000000000000..6548ce610bfaab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_emotion_scaaseu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_emotion_scaaseu_pipeline pipeline DistilBertForSequenceClassification from scaaseu +author: John Snow Labs +name: distilbert_emotion_scaaseu_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_scaaseu_pipeline` is a English model originally trained by scaaseu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_scaaseu_pipeline_en_5.5.0_3.0_1727073535771.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_scaaseu_pipeline_en_5.5.0_3.0_1727073535771.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_emotion_scaaseu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_emotion_scaaseu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_scaaseu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/scaaseu/distilbert-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_finetuned_go_emotions_dataset_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_finetuned_go_emotions_dataset_pipeline_en.md new file mode 100644 index 00000000000000..f6cada97fd3c00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_finetuned_go_emotions_dataset_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_finetuned_go_emotions_dataset_pipeline pipeline DistilBertForSequenceClassification from abdurrahman22224 +author: John Snow Labs +name: distilbert_finetuned_go_emotions_dataset_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_go_emotions_dataset_pipeline` is a English model originally trained by abdurrahman22224. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_go_emotions_dataset_pipeline_en_5.5.0_3.0_1727087018911.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_go_emotions_dataset_pipeline_en_5.5.0_3.0_1727087018911.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_go_emotions_dataset_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_go_emotions_dataset_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_go_emotions_dataset_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/abdurrahman22224/distilbert-finetuned-go-emotions_dataset + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_imdb_kuma9831_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_imdb_kuma9831_en.md new file mode 100644 index 00000000000000..e4f7958232eed4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_imdb_kuma9831_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_imdb_kuma9831 DistilBertForSequenceClassification from kuma9831 +author: John Snow Labs +name: distilbert_imdb_kuma9831 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_kuma9831` is a English model originally trained by kuma9831. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_kuma9831_en_5.5.0_3.0_1727097248222.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_kuma9831_en_5.5.0_3.0_1727097248222.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_kuma9831","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_kuma9831", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_kuma9831| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/kuma9831/distilbert-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_pd_books_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_pd_books_en.md new file mode 100644 index 00000000000000..baff320e758cbb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_pd_books_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_pd_books DistilBertForSequenceClassification from Gaxys +author: John Snow Labs +name: distilbert_pd_books +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_pd_books` is a English model originally trained by Gaxys. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_pd_books_en_5.5.0_3.0_1727087226968.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_pd_books_en_5.5.0_3.0_1727087226968.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_pd_books","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_pd_books", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_pd_books| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Gaxys/DistilBERT-PD_Books \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_en.md new file mode 100644 index 00000000000000..b5be68090efbed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_en_5.5.0_3.0_1727082200538.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256_en_5.5.0_3.0_1727082200538.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_data_aug_cola_256| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|71.6 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_data_aug_cola_256 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_ft_gaming_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_ft_gaming_en.md new file mode 100644 index 00000000000000..4c7975eb66fe55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_ft_gaming_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_ft_gaming RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_gaming +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_gaming` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_gaming_en_5.5.0_3.0_1727065957320.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_gaming_en_5.5.0_3.0_1727065957320.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_gaming","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_ft_gaming","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_gaming| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-gaming \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_ft_gaming_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_ft_gaming_pipeline_en.md new file mode 100644 index 00000000000000..6196a3b3052b21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_ft_gaming_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_ft_gaming_pipeline pipeline RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_gaming_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_gaming_pipeline` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_gaming_pipeline_en_5.5.0_3.0_1727065971483.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_gaming_pipeline_en_5.5.0_3.0_1727065971483.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_ft_gaming_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_ft_gaming_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_gaming_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-gaming + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_model_transcript_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_model_transcript_en.md new file mode 100644 index 00000000000000..22666156d897b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_base_model_transcript_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_model_transcript RoBertaEmbeddings from mahaamami +author: John Snow Labs +name: distilroberta_base_model_transcript +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_model_transcript` is a English model originally trained by mahaamami. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_model_transcript_en_5.5.0_3.0_1727065862431.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_model_transcript_en_5.5.0_3.0_1727065862431.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_model_transcript","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_model_transcript","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_model_transcript| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/mahaamami/distilroberta-base-model-transcript \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilroberta_pr200k_ep20_reuters_bloomberg_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_pr200k_ep20_reuters_bloomberg_en.md new file mode 100644 index 00000000000000..2d33cc45694392 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_pr200k_ep20_reuters_bloomberg_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_pr200k_ep20_reuters_bloomberg RoBertaEmbeddings from judy93536 +author: John Snow Labs +name: distilroberta_pr200k_ep20_reuters_bloomberg +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_pr200k_ep20_reuters_bloomberg` is a English model originally trained by judy93536. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_pr200k_ep20_reuters_bloomberg_en_5.5.0_3.0_1727091853346.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_pr200k_ep20_reuters_bloomberg_en_5.5.0_3.0_1727091853346.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_pr200k_ep20_reuters_bloomberg","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_pr200k_ep20_reuters_bloomberg","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_pr200k_ep20_reuters_bloomberg| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.4 MB| + +## References + +https://huggingface.co/judy93536/distilroberta-pr200k-ep20-reuters-bloomberg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilroberta_pr200k_ep20_reuters_bloomberg_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_pr200k_ep20_reuters_bloomberg_pipeline_en.md new file mode 100644 index 00000000000000..e06422c59196d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_pr200k_ep20_reuters_bloomberg_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_pr200k_ep20_reuters_bloomberg_pipeline pipeline RoBertaEmbeddings from judy93536 +author: John Snow Labs +name: distilroberta_pr200k_ep20_reuters_bloomberg_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_pr200k_ep20_reuters_bloomberg_pipeline` is a English model originally trained by judy93536. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_pr200k_ep20_reuters_bloomberg_pipeline_en_5.5.0_3.0_1727091868200.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_pr200k_ep20_reuters_bloomberg_pipeline_en_5.5.0_3.0_1727091868200.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_pr200k_ep20_reuters_bloomberg_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_pr200k_ep20_reuters_bloomberg_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_pr200k_ep20_reuters_bloomberg_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.4 MB| + +## References + +https://huggingface.co/judy93536/distilroberta-pr200k-ep20-reuters-bloomberg + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilroberta_rb156k_opt15_ep20_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_rb156k_opt15_ep20_pipeline_en.md new file mode 100644 index 00000000000000..49ea282875f861 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilroberta_rb156k_opt15_ep20_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_rb156k_opt15_ep20_pipeline pipeline RoBertaEmbeddings from judy93536 +author: John Snow Labs +name: distilroberta_rb156k_opt15_ep20_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_rb156k_opt15_ep20_pipeline` is a English model originally trained by judy93536. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_rb156k_opt15_ep20_pipeline_en_5.5.0_3.0_1727092014779.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_rb156k_opt15_ep20_pipeline_en_5.5.0_3.0_1727092014779.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_rb156k_opt15_ep20_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_rb156k_opt15_ep20_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_rb156k_opt15_ep20_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.0 MB| + +## References + +https://huggingface.co/judy93536/distilroberta-rb156k-opt15-ep20 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-dp_roberta_large_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-23-dp_roberta_large_finetuned_en.md new file mode 100644 index 00000000000000..b3592bbe8ca64a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-dp_roberta_large_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dp_roberta_large_finetuned RoBertaForSequenceClassification from GRMenon +author: John Snow Labs +name: dp_roberta_large_finetuned +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dp_roberta_large_finetuned` is a English model originally trained by GRMenon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dp_roberta_large_finetuned_en_5.5.0_3.0_1727085991811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dp_roberta_large_finetuned_en_5.5.0_3.0_1727085991811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("dp_roberta_large_finetuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("dp_roberta_large_finetuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dp_roberta_large_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/GRMenon/dp-roberta-large-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-efficient_mlm_m0_40_finetuned_cola_en.md b/docs/_posts/ahmedlone127/2024-09-23-efficient_mlm_m0_40_finetuned_cola_en.md new file mode 100644 index 00000000000000..e5417728713839 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-efficient_mlm_m0_40_finetuned_cola_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English efficient_mlm_m0_40_finetuned_cola RoBertaForSequenceClassification from QGXQ +author: John Snow Labs +name: efficient_mlm_m0_40_finetuned_cola +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`efficient_mlm_m0_40_finetuned_cola` is a English model originally trained by QGXQ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/efficient_mlm_m0_40_finetuned_cola_en_5.5.0_3.0_1727055277613.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/efficient_mlm_m0_40_finetuned_cola_en_5.5.0_3.0_1727055277613.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("efficient_mlm_m0_40_finetuned_cola","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("efficient_mlm_m0_40_finetuned_cola", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|efficient_mlm_m0_40_finetuned_cola| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/QGXQ/efficient_mlm_m0.40-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-efficient_mlm_m0_40_finetuned_cola_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-efficient_mlm_m0_40_finetuned_cola_pipeline_en.md new file mode 100644 index 00000000000000..70f64e6434f767 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-efficient_mlm_m0_40_finetuned_cola_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English efficient_mlm_m0_40_finetuned_cola_pipeline pipeline RoBertaForSequenceClassification from QGXQ +author: John Snow Labs +name: efficient_mlm_m0_40_finetuned_cola_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`efficient_mlm_m0_40_finetuned_cola_pipeline` is a English model originally trained by QGXQ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/efficient_mlm_m0_40_finetuned_cola_pipeline_en_5.5.0_3.0_1727055342728.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/efficient_mlm_m0_40_finetuned_cola_pipeline_en_5.5.0_3.0_1727055342728.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("efficient_mlm_m0_40_finetuned_cola_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("efficient_mlm_m0_40_finetuned_cola_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|efficient_mlm_m0_40_finetuned_cola_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/QGXQ/efficient_mlm_m0.40-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ellis_v2_emotion_leadership_multi_label_en.md b/docs/_posts/ahmedlone127/2024-09-23-ellis_v2_emotion_leadership_multi_label_en.md new file mode 100644 index 00000000000000..1905103958f5e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ellis_v2_emotion_leadership_multi_label_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ellis_v2_emotion_leadership_multi_label DistilBertForSequenceClassification from gsl22 +author: John Snow Labs +name: ellis_v2_emotion_leadership_multi_label +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ellis_v2_emotion_leadership_multi_label` is a English model originally trained by gsl22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ellis_v2_emotion_leadership_multi_label_en_5.5.0_3.0_1727082268801.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ellis_v2_emotion_leadership_multi_label_en_5.5.0_3.0_1727082268801.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("ellis_v2_emotion_leadership_multi_label","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("ellis_v2_emotion_leadership_multi_label", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ellis_v2_emotion_leadership_multi_label| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gsl22/ellis-v2-emotion-leadership-multi-label \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-emotions_roberta_test_4_en.md b/docs/_posts/ahmedlone127/2024-09-23-emotions_roberta_test_4_en.md new file mode 100644 index 00000000000000..237adc198063e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-emotions_roberta_test_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English emotions_roberta_test_4 RoBertaForSequenceClassification from Zeyu2000 +author: John Snow Labs +name: emotions_roberta_test_4 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotions_roberta_test_4` is a English model originally trained by Zeyu2000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotions_roberta_test_4_en_5.5.0_3.0_1727054829063.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotions_roberta_test_4_en_5.5.0_3.0_1727054829063.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("emotions_roberta_test_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("emotions_roberta_test_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotions_roberta_test_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|450.3 MB| + +## References + +https://huggingface.co/Zeyu2000/emotions-roberta-test-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-emotions_roberta_test_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-emotions_roberta_test_4_pipeline_en.md new file mode 100644 index 00000000000000..46e76a321683e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-emotions_roberta_test_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emotions_roberta_test_4_pipeline pipeline RoBertaForSequenceClassification from Zeyu2000 +author: John Snow Labs +name: emotions_roberta_test_4_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotions_roberta_test_4_pipeline` is a English model originally trained by Zeyu2000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotions_roberta_test_4_pipeline_en_5.5.0_3.0_1727054851569.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotions_roberta_test_4_pipeline_en_5.5.0_3.0_1727054851569.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emotions_roberta_test_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emotions_roberta_test_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotions_roberta_test_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|450.3 MB| + +## References + +https://huggingface.co/Zeyu2000/emotions-roberta-test-4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-enlm_roberta_imdb_final_en.md b/docs/_posts/ahmedlone127/2024-09-23-enlm_roberta_imdb_final_en.md new file mode 100644 index 00000000000000..1b29a2cd53df48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-enlm_roberta_imdb_final_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English enlm_roberta_imdb_final XlmRoBertaForSequenceClassification from manirai91 +author: John Snow Labs +name: enlm_roberta_imdb_final +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`enlm_roberta_imdb_final` is a English model originally trained by manirai91. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/enlm_roberta_imdb_final_en_5.5.0_3.0_1727088093590.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/enlm_roberta_imdb_final_en_5.5.0_3.0_1727088093590.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("enlm_roberta_imdb_final","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("enlm_roberta_imdb_final", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|enlm_roberta_imdb_final| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|466.5 MB| + +## References + +https://huggingface.co/manirai91/enlm-roberta-imdb-final \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-enlm_roberta_imdb_final_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-enlm_roberta_imdb_final_pipeline_en.md new file mode 100644 index 00000000000000..d6a99bef795f16 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-enlm_roberta_imdb_final_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English enlm_roberta_imdb_final_pipeline pipeline XlmRoBertaForSequenceClassification from manirai91 +author: John Snow Labs +name: enlm_roberta_imdb_final_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`enlm_roberta_imdb_final_pipeline` is a English model originally trained by manirai91. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/enlm_roberta_imdb_final_pipeline_en_5.5.0_3.0_1727088120024.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/enlm_roberta_imdb_final_pipeline_en_5.5.0_3.0_1727088120024.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("enlm_roberta_imdb_final_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("enlm_roberta_imdb_final_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|enlm_roberta_imdb_final_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.6 MB| + +## References + +https://huggingface.co/manirai91/enlm-roberta-imdb-final + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-fine_tune_whisper_small_sania67_en.md b/docs/_posts/ahmedlone127/2024-09-23-fine_tune_whisper_small_sania67_en.md new file mode 100644 index 00000000000000..1ff2d57dc8be5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-fine_tune_whisper_small_sania67_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English fine_tune_whisper_small_sania67 WhisperForCTC from Sania67 +author: John Snow Labs +name: fine_tune_whisper_small_sania67 +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tune_whisper_small_sania67` is a English model originally trained by Sania67. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tune_whisper_small_sania67_en_5.5.0_3.0_1727076190927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tune_whisper_small_sania67_en_5.5.0_3.0_1727076190927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("fine_tune_whisper_small_sania67","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("fine_tune_whisper_small_sania67", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tune_whisper_small_sania67| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Sania67/Fine_tune_whisper_small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_en.md new file mode 100644 index 00000000000000..c96c44b715e302 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner RoBertaForTokenClassification from manucos +author: John Snow Labs +name: finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner` is a English model originally trained by manucos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_en_5.5.0_3.0_1727081601983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_en_5.5.0_3.0_1727081601983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|469.7 MB| + +## References + +https://huggingface.co/manucos/finetuned__roberta-clinical-wl-es__augmented-ultrasounds-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuned_bio_clinicalbert_2012i2b2_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuned_bio_clinicalbert_2012i2b2_en.md new file mode 100644 index 00000000000000..e93f7621f1f101 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuned_bio_clinicalbert_2012i2b2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_bio_clinicalbert_2012i2b2 BertForTokenClassification from xiaojingduan +author: John Snow Labs +name: finetuned_bio_clinicalbert_2012i2b2 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_bio_clinicalbert_2012i2b2` is a English model originally trained by xiaojingduan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_bio_clinicalbert_2012i2b2_en_5.5.0_3.0_1727060787463.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_bio_clinicalbert_2012i2b2_en_5.5.0_3.0_1727060787463.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("finetuned_bio_clinicalbert_2012i2b2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("finetuned_bio_clinicalbert_2012i2b2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_bio_clinicalbert_2012i2b2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.3 MB| + +## References + +https://huggingface.co/xiaojingduan/finetuned_bio_clinicalbert_2012i2b2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuned_model_on_4k_samples_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuned_model_on_4k_samples_en.md new file mode 100644 index 00000000000000..f38bb4d72abf73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuned_model_on_4k_samples_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_model_on_4k_samples DistilBertForSequenceClassification from Wolverine001 +author: John Snow Labs +name: finetuned_model_on_4k_samples +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_model_on_4k_samples` is a English model originally trained by Wolverine001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_model_on_4k_samples_en_5.5.0_3.0_1727059646912.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_model_on_4k_samples_en_5.5.0_3.0_1727059646912.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_model_on_4k_samples","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_model_on_4k_samples", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_model_on_4k_samples| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Wolverine001/finetuned_model_on-4k-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_emotion_model_keinpyisi_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_emotion_model_keinpyisi_en.md new file mode 100644 index 00000000000000..18fee8b52a851b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_emotion_model_keinpyisi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_emotion_model_keinpyisi DistilBertForSequenceClassification from keinpyisi +author: John Snow Labs +name: finetuning_emotion_model_keinpyisi +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_emotion_model_keinpyisi` is a English model originally trained by keinpyisi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_emotion_model_keinpyisi_en_5.5.0_3.0_1727094018184.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_emotion_model_keinpyisi_en_5.5.0_3.0_1727094018184.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_emotion_model_keinpyisi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_emotion_model_keinpyisi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_emotion_model_keinpyisi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/keinpyisi/finetuning-emotion-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_analysis_model_team_28_mekteck_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_analysis_model_team_28_mekteck_en.md new file mode 100644 index 00000000000000..34d2b599650e1a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_analysis_model_team_28_mekteck_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_analysis_model_team_28_mekteck DistilBertForSequenceClassification from Mekteck +author: John Snow Labs +name: finetuning_sentiment_analysis_model_team_28_mekteck +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_analysis_model_team_28_mekteck` is a English model originally trained by Mekteck. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_model_team_28_mekteck_en_5.5.0_3.0_1727093817128.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_model_team_28_mekteck_en_5.5.0_3.0_1727093817128.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_analysis_model_team_28_mekteck","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_analysis_model_team_28_mekteck", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_analysis_model_team_28_mekteck| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Mekteck/finetuning-sentiment-analysis-model-team-28 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ducpham1501_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ducpham1501_en.md new file mode 100644 index 00000000000000..9c1f43ba19bdaf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ducpham1501_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ducpham1501 DistilBertForSequenceClassification from DucPham1501 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ducpham1501 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ducpham1501` is a English model originally trained by DucPham1501. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ducpham1501_en_5.5.0_3.0_1727087281383.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ducpham1501_en_5.5.0_3.0_1727087281383.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ducpham1501","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ducpham1501", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ducpham1501| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DucPham1501/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ginosky_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ginosky_en.md new file mode 100644 index 00000000000000..a69b32765d4d56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ginosky_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ginosky DistilBertForSequenceClassification from ginosky +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ginosky +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ginosky` is a English model originally trained by ginosky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ginosky_en_5.5.0_3.0_1727073714240.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ginosky_en_5.5.0_3.0_1727073714240.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ginosky","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ginosky", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ginosky| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ginosky/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ginosky_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ginosky_pipeline_en.md new file mode 100644 index 00000000000000..99034fc252d19f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ginosky_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ginosky_pipeline pipeline DistilBertForSequenceClassification from ginosky +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ginosky_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ginosky_pipeline` is a English model originally trained by ginosky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ginosky_pipeline_en_5.5.0_3.0_1727073726626.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ginosky_pipeline_en_5.5.0_3.0_1727073726626.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_ginosky_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_ginosky_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ginosky_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ginosky/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_h9v8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_h9v8_pipeline_en.md new file mode 100644 index 00000000000000..4cd48b5bf2f75f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_h9v8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_h9v8_pipeline pipeline DistilBertForSequenceClassification from H9V8 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_h9v8_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_h9v8_pipeline` is a English model originally trained by H9V8. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_h9v8_pipeline_en_5.5.0_3.0_1727097250957.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_h9v8_pipeline_en_5.5.0_3.0_1727097250957.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_h9v8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_h9v8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_h9v8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/H9V8/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ritesh47_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ritesh47_pipeline_en.md new file mode 100644 index 00000000000000..ba31eb5aa7cb1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ritesh47_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ritesh47_pipeline pipeline DistilBertForSequenceClassification from ritesh47 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ritesh47_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ritesh47_pipeline` is a English model originally trained by ritesh47. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ritesh47_pipeline_en_5.5.0_3.0_1727073645874.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ritesh47_pipeline_en_5.5.0_3.0_1727073645874.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_ritesh47_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_ritesh47_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ritesh47_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ritesh47/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_thanhchauns2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_thanhchauns2_pipeline_en.md new file mode 100644 index 00000000000000..6a4356e0f3e3cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_thanhchauns2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_thanhchauns2_pipeline pipeline DistilBertForSequenceClassification from thanhchauns2 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_thanhchauns2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_thanhchauns2_pipeline` is a English model originally trained by thanhchauns2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_thanhchauns2_pipeline_en_5.5.0_3.0_1727082623262.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_thanhchauns2_pipeline_en_5.5.0_3.0_1727082623262.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_thanhchauns2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_thanhchauns2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_thanhchauns2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thanhchauns2/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_5000_amazon_cm_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_5000_amazon_cm_en.md new file mode 100644 index 00000000000000..81f344d1f4fffd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_5000_amazon_cm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_5000_amazon_cm DistilBertForSequenceClassification from abyesses +author: John Snow Labs +name: finetuning_sentiment_model_5000_amazon_cm +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_5000_amazon_cm` is a English model originally trained by abyesses. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazon_cm_en_5.5.0_3.0_1727096946966.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazon_cm_en_5.5.0_3.0_1727096946966.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_5000_amazon_cm","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_5000_amazon_cm", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_5000_amazon_cm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/abyesses/finetuning-sentiment-model-5000-amazon_cm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-graphcodebert_base_german_en.md b/docs/_posts/ahmedlone127/2024-09-23-graphcodebert_base_german_en.md new file mode 100644 index 00000000000000..72bc852d8c5665 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-graphcodebert_base_german_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English graphcodebert_base_german RoBertaEmbeddings from meloneneneis +author: John Snow Labs +name: graphcodebert_base_german +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`graphcodebert_base_german` is a English model originally trained by meloneneneis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/graphcodebert_base_german_en_5.5.0_3.0_1727080576847.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/graphcodebert_base_german_en_5.5.0_3.0_1727080576847.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("graphcodebert_base_german","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("graphcodebert_base_german","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|graphcodebert_base_german| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|480.3 MB| + +## References + +https://huggingface.co/meloneneneis/graphcodebert-base-german \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-graphcodebert_base_german_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-graphcodebert_base_german_pipeline_en.md new file mode 100644 index 00000000000000..74903d4804e43f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-graphcodebert_base_german_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English graphcodebert_base_german_pipeline pipeline RoBertaEmbeddings from meloneneneis +author: John Snow Labs +name: graphcodebert_base_german_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`graphcodebert_base_german_pipeline` is a English model originally trained by meloneneneis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/graphcodebert_base_german_pipeline_en_5.5.0_3.0_1727080599841.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/graphcodebert_base_german_pipeline_en_5.5.0_3.0_1727080599841.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("graphcodebert_base_german_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("graphcodebert_base_german_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|graphcodebert_base_german_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|480.3 MB| + +## References + +https://huggingface.co/meloneneneis/graphcodebert-base-german + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-groberta_goemotions_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-groberta_goemotions_pipeline_en.md new file mode 100644 index 00000000000000..1a695e14a2a2f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-groberta_goemotions_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English groberta_goemotions_pipeline pipeline RoBertaForSequenceClassification from Mukundhan32 +author: John Snow Labs +name: groberta_goemotions_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`groberta_goemotions_pipeline` is a English model originally trained by Mukundhan32. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/groberta_goemotions_pipeline_en_5.5.0_3.0_1727085834370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/groberta_goemotions_pipeline_en_5.5.0_3.0_1727085834370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("groberta_goemotions_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("groberta_goemotions_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|groberta_goemotions_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|453.4 MB| + +## References + +https://huggingface.co/Mukundhan32/Groberta-goemotions + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-happy_en.md b/docs/_posts/ahmedlone127/2024-09-23-happy_en.md new file mode 100644 index 00000000000000..918b3c779cc0c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-happy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English happy RoBertaEmbeddings from MatthijsN +author: John Snow Labs +name: happy +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`happy` is a English model originally trained by MatthijsN. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/happy_en_5.5.0_3.0_1727057061183.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/happy_en_5.5.0_3.0_1727057061183.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("happy","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("happy","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|happy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/MatthijsN/happy \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-harmful_content_trainer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-harmful_content_trainer_pipeline_en.md new file mode 100644 index 00000000000000..020d7fa5c49456 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-harmful_content_trainer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English harmful_content_trainer_pipeline pipeline DistilBertForSequenceClassification from AIUs3r0 +author: John Snow Labs +name: harmful_content_trainer_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`harmful_content_trainer_pipeline` is a English model originally trained by AIUs3r0. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/harmful_content_trainer_pipeline_en_5.5.0_3.0_1727059860413.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/harmful_content_trainer_pipeline_en_5.5.0_3.0_1727059860413.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("harmful_content_trainer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("harmful_content_trainer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|harmful_content_trainer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AIUs3r0/Harmful_Content_Trainer + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random0_seed0_bertweet_large_en.md b/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random0_seed0_bertweet_large_en.md new file mode 100644 index 00000000000000..7c8f27a191dc5e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random0_seed0_bertweet_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_balance_random0_seed0_bertweet_large RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random0_seed0_bertweet_large +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random0_seed0_bertweet_large` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random0_seed0_bertweet_large_en_5.5.0_3.0_1727055095050.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random0_seed0_bertweet_large_en_5.5.0_3.0_1727055095050.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_balance_random0_seed0_bertweet_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_balance_random0_seed0_bertweet_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random0_seed0_bertweet_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random0_seed0-bertweet-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random0_seed0_bertweet_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random0_seed0_bertweet_large_pipeline_en.md new file mode 100644 index 00000000000000..6d20c59d82f608 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random0_seed0_bertweet_large_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_balance_random0_seed0_bertweet_large_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random0_seed0_bertweet_large_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random0_seed0_bertweet_large_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random0_seed0_bertweet_large_pipeline_en_5.5.0_3.0_1727055179350.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random0_seed0_bertweet_large_pipeline_en_5.5.0_3.0_1727055179350.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_balance_random0_seed0_bertweet_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_balance_random0_seed0_bertweet_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random0_seed0_bertweet_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random0_seed0-bertweet-large + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_en.md b/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_en.md new file mode 100644 index 00000000000000..71a769de9178a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_en_5.5.0_3.0_1727055765647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_en_5.5.0_3.0_1727055765647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random1_seed1-twitter-roberta-large-2022-154m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_pipeline_en.md new file mode 100644 index 00000000000000..2879babf4000b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_pipeline_en_5.5.0_3.0_1727055826494.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_pipeline_en_5.5.0_3.0_1727055826494.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random1_seed1_twitter_roberta_large_2022_154m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random1_seed1-twitter-roberta-large-2022-154m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_pipeline_en.md new file mode 100644 index 00000000000000..16e09992363aa4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_pipeline_en_5.5.0_3.0_1727055499145.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_pipeline_en_5.5.0_3.0_1727055499145.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random2_seed0_twitter_roberta_base_2021_124m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random2_seed0-twitter-roberta-base-2021-124m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hindi_bpe_bert_test_2m_en.md b/docs/_posts/ahmedlone127/2024-09-23-hindi_bpe_bert_test_2m_en.md new file mode 100644 index 00000000000000..afa6a57ee8ec92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hindi_bpe_bert_test_2m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hindi_bpe_bert_test_2m BertEmbeddings from rg1683 +author: John Snow Labs +name: hindi_bpe_bert_test_2m +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hindi_bpe_bert_test_2m` is a English model originally trained by rg1683. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hindi_bpe_bert_test_2m_en_5.5.0_3.0_1727107688305.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hindi_bpe_bert_test_2m_en_5.5.0_3.0_1727107688305.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("hindi_bpe_bert_test_2m","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("hindi_bpe_bert_test_2m","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hindi_bpe_bert_test_2m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|377.8 MB| + +## References + +https://huggingface.co/rg1683/hindi_bpe_bert_test_2m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hindi_bpe_bert_test_2m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-hindi_bpe_bert_test_2m_pipeline_en.md new file mode 100644 index 00000000000000..66ef7d06068c2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hindi_bpe_bert_test_2m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hindi_bpe_bert_test_2m_pipeline pipeline BertEmbeddings from rg1683 +author: John Snow Labs +name: hindi_bpe_bert_test_2m_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hindi_bpe_bert_test_2m_pipeline` is a English model originally trained by rg1683. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hindi_bpe_bert_test_2m_pipeline_en_5.5.0_3.0_1727107705675.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hindi_bpe_bert_test_2m_pipeline_en_5.5.0_3.0_1727107705675.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hindi_bpe_bert_test_2m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hindi_bpe_bert_test_2m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hindi_bpe_bert_test_2m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|377.8 MB| + +## References + +https://huggingface.co/rg1683/hindi_bpe_bert_test_2m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-humor_recognition_greek_distilbert_pipeline_el.md b/docs/_posts/ahmedlone127/2024-09-23-humor_recognition_greek_distilbert_pipeline_el.md new file mode 100644 index 00000000000000..1b162b89532230 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-humor_recognition_greek_distilbert_pipeline_el.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Modern Greek (1453-) humor_recognition_greek_distilbert_pipeline pipeline DistilBertForSequenceClassification from Kalloniatis +author: John Snow Labs +name: humor_recognition_greek_distilbert_pipeline +date: 2024-09-23 +tags: [el, open_source, pipeline, onnx] +task: Text Classification +language: el +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`humor_recognition_greek_distilbert_pipeline` is a Modern Greek (1453-) model originally trained by Kalloniatis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/humor_recognition_greek_distilbert_pipeline_el_5.5.0_3.0_1727074201489.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/humor_recognition_greek_distilbert_pipeline_el_5.5.0_3.0_1727074201489.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("humor_recognition_greek_distilbert_pipeline", lang = "el") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("humor_recognition_greek_distilbert_pipeline", lang = "el") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|humor_recognition_greek_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|el| +|Size:|507.6 MB| + +## References + +https://huggingface.co/Kalloniatis/Humor-Recognition-Greek-DistilBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hw1223_01_en.md b/docs/_posts/ahmedlone127/2024-09-23-hw1223_01_en.md new file mode 100644 index 00000000000000..d2be1409d91e0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hw1223_01_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hw1223_01 DistilBertForSequenceClassification from tunyu +author: John Snow Labs +name: hw1223_01 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hw1223_01` is a English model originally trained by tunyu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hw1223_01_en_5.5.0_3.0_1727059540790.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hw1223_01_en_5.5.0_3.0_1727059540790.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("hw1223_01","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("hw1223_01", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hw1223_01| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tunyu/HW1223_01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hw_1_irisliou_en.md b/docs/_posts/ahmedlone127/2024-09-23-hw_1_irisliou_en.md new file mode 100644 index 00000000000000..c9964578560936 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hw_1_irisliou_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hw_1_irisliou DistilBertForSequenceClassification from IrisLiou +author: John Snow Labs +name: hw_1_irisliou +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hw_1_irisliou` is a English model originally trained by IrisLiou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hw_1_irisliou_en_5.5.0_3.0_1727093820951.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hw_1_irisliou_en_5.5.0_3.0_1727093820951.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("hw_1_irisliou","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("hw_1_irisliou", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hw_1_irisliou| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/IrisLiou/hw-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-id2223_lab2_whisper_nelanbu_en.md b/docs/_posts/ahmedlone127/2024-09-23-id2223_lab2_whisper_nelanbu_en.md new file mode 100644 index 00000000000000..d19e6b1c3178a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-id2223_lab2_whisper_nelanbu_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English id2223_lab2_whisper_nelanbu WhisperForCTC from nelanbu +author: John Snow Labs +name: id2223_lab2_whisper_nelanbu +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`id2223_lab2_whisper_nelanbu` is a English model originally trained by nelanbu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/id2223_lab2_whisper_nelanbu_en_5.5.0_3.0_1727051979457.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/id2223_lab2_whisper_nelanbu_en_5.5.0_3.0_1727051979457.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("id2223_lab2_whisper_nelanbu","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("id2223_lab2_whisper_nelanbu", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|id2223_lab2_whisper_nelanbu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/nelanbu/ID2223_Lab2_Whisper \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-imdbreviews_classification_distilbert_v02_mancd_en.md b/docs/_posts/ahmedlone127/2024-09-23-imdbreviews_classification_distilbert_v02_mancd_en.md new file mode 100644 index 00000000000000..61976dcb02087a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-imdbreviews_classification_distilbert_v02_mancd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English imdbreviews_classification_distilbert_v02_mancd DistilBertForSequenceClassification from ManCD +author: John Snow Labs +name: imdbreviews_classification_distilbert_v02_mancd +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdbreviews_classification_distilbert_v02_mancd` is a English model originally trained by ManCD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_distilbert_v02_mancd_en_5.5.0_3.0_1727093814161.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_distilbert_v02_mancd_en_5.5.0_3.0_1727093814161.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("imdbreviews_classification_distilbert_v02_mancd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("imdbreviews_classification_distilbert_v02_mancd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdbreviews_classification_distilbert_v02_mancd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ManCD/imdbreviews_classification_distilbert_v02 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-imdbreviews_classification_distilbert_v02_mancd_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-imdbreviews_classification_distilbert_v02_mancd_pipeline_en.md new file mode 100644 index 00000000000000..8c5c52407f464c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-imdbreviews_classification_distilbert_v02_mancd_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English imdbreviews_classification_distilbert_v02_mancd_pipeline pipeline DistilBertForSequenceClassification from ManCD +author: John Snow Labs +name: imdbreviews_classification_distilbert_v02_mancd_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdbreviews_classification_distilbert_v02_mancd_pipeline` is a English model originally trained by ManCD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_distilbert_v02_mancd_pipeline_en_5.5.0_3.0_1727093825749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_distilbert_v02_mancd_pipeline_en_5.5.0_3.0_1727093825749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("imdbreviews_classification_distilbert_v02_mancd_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("imdbreviews_classification_distilbert_v02_mancd_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdbreviews_classification_distilbert_v02_mancd_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ManCD/imdbreviews_classification_distilbert_v02 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-inisw08_robert_mlm_adamw_torch_bs8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-inisw08_robert_mlm_adamw_torch_bs8_pipeline_en.md new file mode 100644 index 00000000000000..0aec2fc2d4e865 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-inisw08_robert_mlm_adamw_torch_bs8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English inisw08_robert_mlm_adamw_torch_bs8_pipeline pipeline RoBertaEmbeddings from ugiugi +author: John Snow Labs +name: inisw08_robert_mlm_adamw_torch_bs8_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`inisw08_robert_mlm_adamw_torch_bs8_pipeline` is a English model originally trained by ugiugi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/inisw08_robert_mlm_adamw_torch_bs8_pipeline_en_5.5.0_3.0_1727066126742.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/inisw08_robert_mlm_adamw_torch_bs8_pipeline_en_5.5.0_3.0_1727066126742.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("inisw08_robert_mlm_adamw_torch_bs8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("inisw08_robert_mlm_adamw_torch_bs8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|inisw08_robert_mlm_adamw_torch_bs8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.7 MB| + +## References + +https://huggingface.co/ugiugi/inisw08-RoBERT-mlm-adamw_torch_bs8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-interview_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-23-interview_classifier_en.md new file mode 100644 index 00000000000000..9f099fc38c1cb8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-interview_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English interview_classifier DistilBertForSequenceClassification from eskayML +author: John Snow Labs +name: interview_classifier +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`interview_classifier` is a English model originally trained by eskayML. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/interview_classifier_en_5.5.0_3.0_1727073517281.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/interview_classifier_en_5.5.0_3.0_1727073517281.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("interview_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("interview_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|interview_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/eskayML/interview_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-lab2_whisper_swedish_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-23-lab2_whisper_swedish_pipeline_hi.md new file mode 100644 index 00000000000000..aebf71baf5d78e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-lab2_whisper_swedish_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi lab2_whisper_swedish_pipeline pipeline WhisperForCTC from SodraZatre +author: John Snow Labs +name: lab2_whisper_swedish_pipeline +date: 2024-09-23 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab2_whisper_swedish_pipeline` is a Hindi model originally trained by SodraZatre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab2_whisper_swedish_pipeline_hi_5.5.0_3.0_1727116560650.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab2_whisper_swedish_pipeline_hi_5.5.0_3.0_1727116560650.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lab2_whisper_swedish_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lab2_whisper_swedish_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab2_whisper_swedish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/SodraZatre/lab2-whisper-sv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-lab_11_distilbert_sentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-lab_11_distilbert_sentiment_pipeline_en.md new file mode 100644 index 00000000000000..a5a6680c0ded8d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-lab_11_distilbert_sentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lab_11_distilbert_sentiment_pipeline pipeline DistilBertForSequenceClassification from Malecc +author: John Snow Labs +name: lab_11_distilbert_sentiment_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab_11_distilbert_sentiment_pipeline` is a English model originally trained by Malecc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab_11_distilbert_sentiment_pipeline_en_5.5.0_3.0_1727097190060.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab_11_distilbert_sentiment_pipeline_en_5.5.0_3.0_1727097190060.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lab_11_distilbert_sentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lab_11_distilbert_sentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab_11_distilbert_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Malecc/lab_11_distilbert_sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-legal_gqa_bert_augmented_1000_en.md b/docs/_posts/ahmedlone127/2024-09-23-legal_gqa_bert_augmented_1000_en.md new file mode 100644 index 00000000000000..8cf33187e85663 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-legal_gqa_bert_augmented_1000_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English legal_gqa_bert_augmented_1000 BertForQuestionAnswering from farid1088 +author: John Snow Labs +name: legal_gqa_bert_augmented_1000 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_gqa_bert_augmented_1000` is a English model originally trained by farid1088. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_gqa_bert_augmented_1000_en_5.5.0_3.0_1727050234095.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_gqa_bert_augmented_1000_en_5.5.0_3.0_1727050234095.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("legal_gqa_bert_augmented_1000","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("legal_gqa_bert_augmented_1000", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_gqa_bert_augmented_1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/farid1088/Legal_GQA_BERT_augmented_1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-legal_gqa_bert_augmented_1000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-legal_gqa_bert_augmented_1000_pipeline_en.md new file mode 100644 index 00000000000000..40ea022b5f4254 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-legal_gqa_bert_augmented_1000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English legal_gqa_bert_augmented_1000_pipeline pipeline BertForQuestionAnswering from farid1088 +author: John Snow Labs +name: legal_gqa_bert_augmented_1000_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_gqa_bert_augmented_1000_pipeline` is a English model originally trained by farid1088. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_gqa_bert_augmented_1000_pipeline_en_5.5.0_3.0_1727050257283.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_gqa_bert_augmented_1000_pipeline_en_5.5.0_3.0_1727050257283.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("legal_gqa_bert_augmented_1000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("legal_gqa_bert_augmented_1000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_gqa_bert_augmented_1000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/farid1088/Legal_GQA_BERT_augmented_1000 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-liar_fake_news_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-liar_fake_news_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..0d0003b82e9ffa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-liar_fake_news_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English liar_fake_news_roberta_base_pipeline pipeline RoBertaEmbeddings from Jawaher +author: John Snow Labs +name: liar_fake_news_roberta_base_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`liar_fake_news_roberta_base_pipeline` is a English model originally trained by Jawaher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/liar_fake_news_roberta_base_pipeline_en_5.5.0_3.0_1727092185302.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/liar_fake_news_roberta_base_pipeline_en_5.5.0_3.0_1727092185302.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("liar_fake_news_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("liar_fake_news_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|liar_fake_news_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/Jawaher/LIAR-fake-news-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-melodea_final_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-melodea_final_model_pipeline_en.md new file mode 100644 index 00000000000000..3f44f1481b697b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-melodea_final_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English melodea_final_model_pipeline pipeline RoBertaForSequenceClassification from GabiRayman +author: John Snow Labs +name: melodea_final_model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`melodea_final_model_pipeline` is a English model originally trained by GabiRayman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/melodea_final_model_pipeline_en_5.5.0_3.0_1727055109373.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/melodea_final_model_pipeline_en_5.5.0_3.0_1727055109373.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("melodea_final_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("melodea_final_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|melodea_final_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|429.4 MB| + +## References + +https://huggingface.co/GabiRayman/melodea_final-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-mnli_distilled_bart_cross_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-23-mnli_distilled_bart_cross_roberta_en.md new file mode 100644 index 00000000000000..4001a0bdd6102c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-mnli_distilled_bart_cross_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mnli_distilled_bart_cross_roberta RoBertaForSequenceClassification from Sayan01 +author: John Snow Labs +name: mnli_distilled_bart_cross_roberta +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mnli_distilled_bart_cross_roberta` is a English model originally trained by Sayan01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mnli_distilled_bart_cross_roberta_en_5.5.0_3.0_1727085580049.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mnli_distilled_bart_cross_roberta_en_5.5.0_3.0_1727085580049.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("mnli_distilled_bart_cross_roberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("mnli_distilled_bart_cross_roberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mnli_distilled_bart_cross_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.9 MB| + +## References + +https://huggingface.co/Sayan01/mnli-distilled-bart-cross-roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-mnli_distilled_bart_cross_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-mnli_distilled_bart_cross_roberta_pipeline_en.md new file mode 100644 index 00000000000000..cc3f3c4a1bf02a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-mnli_distilled_bart_cross_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mnli_distilled_bart_cross_roberta_pipeline pipeline RoBertaForSequenceClassification from Sayan01 +author: John Snow Labs +name: mnli_distilled_bart_cross_roberta_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mnli_distilled_bart_cross_roberta_pipeline` is a English model originally trained by Sayan01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mnli_distilled_bart_cross_roberta_pipeline_en_5.5.0_3.0_1727085594749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mnli_distilled_bart_cross_roberta_pipeline_en_5.5.0_3.0_1727085594749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mnli_distilled_bart_cross_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mnli_distilled_bart_cross_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mnli_distilled_bart_cross_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/Sayan01/mnli-distilled-bart-cross-roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-mobilebert_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-23-mobilebert_imdb_en.md new file mode 100644 index 00000000000000..4b8f75836888b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-mobilebert_imdb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mobilebert_imdb DistilBertForSequenceClassification from Cippppy +author: John Snow Labs +name: mobilebert_imdb +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mobilebert_imdb` is a English model originally trained by Cippppy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mobilebert_imdb_en_5.5.0_3.0_1727086985826.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mobilebert_imdb_en_5.5.0_3.0_1727086985826.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("mobilebert_imdb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("mobilebert_imdb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mobilebert_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|238.8 MB| + +## References + +https://huggingface.co/Cippppy/mobileBert_imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-mobilebert_imdb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-mobilebert_imdb_pipeline_en.md new file mode 100644 index 00000000000000..fbc7645f21af80 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-mobilebert_imdb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mobilebert_imdb_pipeline pipeline DistilBertForSequenceClassification from Cippppy +author: John Snow Labs +name: mobilebert_imdb_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mobilebert_imdb_pipeline` is a English model originally trained by Cippppy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mobilebert_imdb_pipeline_en_5.5.0_3.0_1727087001139.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mobilebert_imdb_pipeline_en_5.5.0_3.0_1727087001139.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mobilebert_imdb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mobilebert_imdb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mobilebert_imdb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|238.8 MB| + +## References + +https://huggingface.co/Cippppy/mobileBert_imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-model_test_for_upload_en.md b/docs/_posts/ahmedlone127/2024-09-23-model_test_for_upload_en.md new file mode 100644 index 00000000000000..bc6f1c3226a82f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-model_test_for_upload_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_test_for_upload DistilBertForSequenceClassification from JACOBBBB +author: John Snow Labs +name: model_test_for_upload +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_test_for_upload` is a English model originally trained by JACOBBBB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_test_for_upload_en_5.5.0_3.0_1727086844099.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_test_for_upload_en_5.5.0_3.0_1727086844099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("model_test_for_upload","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("model_test_for_upload", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_test_for_upload| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/JACOBBBB/model_test_for_upload \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-model_test_for_upload_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-model_test_for_upload_pipeline_en.md new file mode 100644 index 00000000000000..4f531315643a93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-model_test_for_upload_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_test_for_upload_pipeline pipeline DistilBertForSequenceClassification from JACOBBBB +author: John Snow Labs +name: model_test_for_upload_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_test_for_upload_pipeline` is a English model originally trained by JACOBBBB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_test_for_upload_pipeline_en_5.5.0_3.0_1727086855901.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_test_for_upload_pipeline_en_5.5.0_3.0_1727086855901.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_test_for_upload_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_test_for_upload_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_test_for_upload_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/JACOBBBB/model_test_for_upload + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-msroberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-msroberta_pipeline_en.md new file mode 100644 index 00000000000000..2e4ba32a0c6e6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-msroberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English msroberta_pipeline pipeline RoBertaEmbeddings from nkoh01 +author: John Snow Labs +name: msroberta_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`msroberta_pipeline` is a English model originally trained by nkoh01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/msroberta_pipeline_en_5.5.0_3.0_1727056624961.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/msroberta_pipeline_en_5.5.0_3.0_1727056624961.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("msroberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("msroberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|msroberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/nkoh01/MSRoberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-mt5_base_visp_s2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-mt5_base_visp_s2_pipeline_en.md new file mode 100644 index 00000000000000..3b4f162859ce80 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-mt5_base_visp_s2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English mt5_base_visp_s2_pipeline pipeline T5Transformer from ngwgsang +author: John Snow Labs +name: mt5_base_visp_s2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: [Question Answering, Summarization, Translation, Text Generation] +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained T5Transformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mt5_base_visp_s2_pipeline` is a English model originally trained by ngwgsang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mt5_base_visp_s2_pipeline_en_5.5.0_3.0_1727069187914.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mt5_base_visp_s2_pipeline_en_5.5.0_3.0_1727069187914.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mt5_base_visp_s2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mt5_base_visp_s2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mt5_base_visp_s2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|2.3 GB| + +## References + +https://huggingface.co/ngwgsang/mt5-base-visp-s2 + +## Included Models + +- DocumentAssembler +- T5Transformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-muril_large_chaii_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-23-muril_large_chaii_pipeline_hi.md new file mode 100644 index 00000000000000..6a2470aea3797f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-muril_large_chaii_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi muril_large_chaii_pipeline pipeline BertForQuestionAnswering from abhishek +author: John Snow Labs +name: muril_large_chaii_pipeline +date: 2024-09-23 +tags: [hi, open_source, pipeline, onnx] +task: Question Answering +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`muril_large_chaii_pipeline` is a Hindi model originally trained by abhishek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/muril_large_chaii_pipeline_hi_5.5.0_3.0_1727070855702.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/muril_large_chaii_pipeline_hi_5.5.0_3.0_1727070855702.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("muril_large_chaii_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("muril_large_chaii_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|muril_large_chaii_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.9 GB| + +## References + +https://huggingface.co/abhishek/muril-large-chaii + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_sst5_padding60model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_sst5_padding60model_pipeline_en.md new file mode 100644 index 00000000000000..69524f584658e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_sst5_padding60model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_distilbert_sst5_padding60model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_sst5_padding60model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_sst5_padding60model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding60model_pipeline_en_5.5.0_3.0_1727093940809.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding60model_pipeline_en_5.5.0_3.0_1727093940809.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_distilbert_sst5_padding60model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_distilbert_sst5_padding60model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_sst5_padding60model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_sst5_padding60model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_twitterfin_padding30model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_twitterfin_padding30model_pipeline_en.md new file mode 100644 index 00000000000000..cf541633bafc6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_twitterfin_padding30model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_distilbert_twitterfin_padding30model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_twitterfin_padding30model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_twitterfin_padding30model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_twitterfin_padding30model_pipeline_en_5.5.0_3.0_1727110736186.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_twitterfin_padding30model_pipeline_en_5.5.0_3.0_1727110736186.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_distilbert_twitterfin_padding30model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_distilbert_twitterfin_padding30model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_twitterfin_padding30model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_twitterfin_padding30model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-nameattrsbert_en.md b/docs/_posts/ahmedlone127/2024-09-23-nameattrsbert_en.md new file mode 100644 index 00000000000000..df408980558369 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-nameattrsbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nameattrsbert BertForSequenceClassification from madgnome +author: John Snow Labs +name: nameattrsbert +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nameattrsbert` is a English model originally trained by madgnome. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nameattrsbert_en_5.5.0_3.0_1727095464620.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nameattrsbert_en_5.5.0_3.0_1727095464620.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("nameattrsbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("nameattrsbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nameattrsbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|669.1 MB| + +## References + +https://huggingface.co/madgnome/nameattrsbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-nameattrsbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-nameattrsbert_pipeline_en.md new file mode 100644 index 00000000000000..33c7e80fde2433 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-nameattrsbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nameattrsbert_pipeline pipeline BertForSequenceClassification from madgnome +author: John Snow Labs +name: nameattrsbert_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nameattrsbert_pipeline` is a English model originally trained by madgnome. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nameattrsbert_pipeline_en_5.5.0_3.0_1727095496474.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nameattrsbert_pipeline_en_5.5.0_3.0_1727095496474.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nameattrsbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nameattrsbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nameattrsbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|669.1 MB| + +## References + +https://huggingface.co/madgnome/nameattrsbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_model_vierpiet_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_model_vierpiet_pipeline_en.md new file mode 100644 index 00000000000000..d271df77dc5aca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_model_vierpiet_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nepal_bhasa_model_vierpiet_pipeline pipeline DistilBertForSequenceClassification from vierpiet +author: John Snow Labs +name: nepal_bhasa_model_vierpiet_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_model_vierpiet_pipeline` is a English model originally trained by vierpiet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_model_vierpiet_pipeline_en_5.5.0_3.0_1727082678736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_model_vierpiet_pipeline_en_5.5.0_3.0_1727082678736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nepal_bhasa_model_vierpiet_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nepal_bhasa_model_vierpiet_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_model_vierpiet_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/vierpiet/new_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_phishing_email_detection2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_phishing_email_detection2_pipeline_en.md new file mode 100644 index 00000000000000..f42654403d6f08 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_phishing_email_detection2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nepal_bhasa_phishing_email_detection2_pipeline pipeline DistilBertForSequenceClassification from kamikaze20 +author: John Snow Labs +name: nepal_bhasa_phishing_email_detection2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_phishing_email_detection2_pipeline` is a English model originally trained by kamikaze20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_phishing_email_detection2_pipeline_en_5.5.0_3.0_1727074271998.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_phishing_email_detection2_pipeline_en_5.5.0_3.0_1727074271998.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nepal_bhasa_phishing_email_detection2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nepal_bhasa_phishing_email_detection2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_phishing_email_detection2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/kamikaze20/new_phishing-email-detection2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_pretrained_model_en.md b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_pretrained_model_en.md new file mode 100644 index 00000000000000..59129916a9e17a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_pretrained_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nepal_bhasa_pretrained_model DistilBertForSequenceClassification from Vigneshwaran255 +author: John Snow Labs +name: nepal_bhasa_pretrained_model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_pretrained_model` is a English model originally trained by Vigneshwaran255. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_pretrained_model_en_5.5.0_3.0_1727073859354.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_pretrained_model_en_5.5.0_3.0_1727073859354.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("nepal_bhasa_pretrained_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("nepal_bhasa_pretrained_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_pretrained_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Vigneshwaran255/new_pretrained_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-news_classifier_bert_en.md b/docs/_posts/ahmedlone127/2024-09-23-news_classifier_bert_en.md new file mode 100644 index 00000000000000..7364bed0898fe1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-news_classifier_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English news_classifier_bert DistilBertForSequenceClassification from MehdiHosseiniMoghadam +author: John Snow Labs +name: news_classifier_bert +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`news_classifier_bert` is a English model originally trained by MehdiHosseiniMoghadam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/news_classifier_bert_en_5.5.0_3.0_1727073940171.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/news_classifier_bert_en_5.5.0_3.0_1727073940171.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("news_classifier_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("news_classifier_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|news_classifier_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MehdiHosseiniMoghadam/news_classifier-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-news_classifier_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-news_classifier_bert_pipeline_en.md new file mode 100644 index 00000000000000..7321d04a7ae7fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-news_classifier_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English news_classifier_bert_pipeline pipeline DistilBertForSequenceClassification from MehdiHosseiniMoghadam +author: John Snow Labs +name: news_classifier_bert_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`news_classifier_bert_pipeline` is a English model originally trained by MehdiHosseiniMoghadam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/news_classifier_bert_pipeline_en_5.5.0_3.0_1727073952177.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/news_classifier_bert_pipeline_en_5.5.0_3.0_1727073952177.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("news_classifier_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("news_classifier_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|news_classifier_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MehdiHosseiniMoghadam/news_classifier-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-newsmodelclassification_en.md b/docs/_posts/ahmedlone127/2024-09-23-newsmodelclassification_en.md new file mode 100644 index 00000000000000..a2aa58ce9fa5e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-newsmodelclassification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English newsmodelclassification DistilBertForSequenceClassification from aatmasidha +author: John Snow Labs +name: newsmodelclassification +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`newsmodelclassification` is a English model originally trained by aatmasidha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/newsmodelclassification_en_5.5.0_3.0_1727082475431.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/newsmodelclassification_en_5.5.0_3.0_1727082475431.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("newsmodelclassification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("newsmodelclassification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|newsmodelclassification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aatmasidha/newsmodelclassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-oceanprotocol_tweet_impact_v1_en.md b/docs/_posts/ahmedlone127/2024-09-23-oceanprotocol_tweet_impact_v1_en.md new file mode 100644 index 00000000000000..f1cc18b5c0b5de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-oceanprotocol_tweet_impact_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English oceanprotocol_tweet_impact_v1 RoBertaForSequenceClassification from lucaordronneau +author: John Snow Labs +name: oceanprotocol_tweet_impact_v1 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`oceanprotocol_tweet_impact_v1` is a English model originally trained by lucaordronneau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/oceanprotocol_tweet_impact_v1_en_5.5.0_3.0_1727085367829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/oceanprotocol_tweet_impact_v1_en_5.5.0_3.0_1727085367829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("oceanprotocol_tweet_impact_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("oceanprotocol_tweet_impact_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|oceanprotocol_tweet_impact_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|467.9 MB| + +## References + +https://huggingface.co/lucaordronneau/oceanprotocol-tweet-impact-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-oceanprotocol_tweet_impact_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-oceanprotocol_tweet_impact_v1_pipeline_en.md new file mode 100644 index 00000000000000..0ddfb30517cf27 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-oceanprotocol_tweet_impact_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English oceanprotocol_tweet_impact_v1_pipeline pipeline RoBertaForSequenceClassification from lucaordronneau +author: John Snow Labs +name: oceanprotocol_tweet_impact_v1_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`oceanprotocol_tweet_impact_v1_pipeline` is a English model originally trained by lucaordronneau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/oceanprotocol_tweet_impact_v1_pipeline_en_5.5.0_3.0_1727085390712.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/oceanprotocol_tweet_impact_v1_pipeline_en_5.5.0_3.0_1727085390712.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("oceanprotocol_tweet_impact_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("oceanprotocol_tweet_impact_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|oceanprotocol_tweet_impact_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|467.9 MB| + +## References + +https://huggingface.co/lucaordronneau/oceanprotocol-tweet-impact-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-odiaberta_en.md b/docs/_posts/ahmedlone127/2024-09-23-odiaberta_en.md new file mode 100644 index 00000000000000..00ccc7cd1bd13f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-odiaberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English odiaberta RoBertaEmbeddings from Nikhil7280 +author: John Snow Labs +name: odiaberta +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`odiaberta` is a English model originally trained by Nikhil7280. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/odiaberta_en_5.5.0_3.0_1727080662114.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/odiaberta_en_5.5.0_3.0_1727080662114.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("odiaberta","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("odiaberta","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|odiaberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|311.5 MB| + +## References + +https://huggingface.co/Nikhil7280/OdiaBERTa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-odiaberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-odiaberta_pipeline_en.md new file mode 100644 index 00000000000000..decf71420857b7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-odiaberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English odiaberta_pipeline pipeline RoBertaEmbeddings from Nikhil7280 +author: John Snow Labs +name: odiaberta_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`odiaberta_pipeline` is a English model originally trained by Nikhil7280. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/odiaberta_pipeline_en_5.5.0_3.0_1727080677236.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/odiaberta_pipeline_en_5.5.0_3.0_1727080677236.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("odiaberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("odiaberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|odiaberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|311.6 MB| + +## References + +https://huggingface.co/Nikhil7280/OdiaBERTa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-paws_x_xlm_r_only_spanish_en.md b/docs/_posts/ahmedlone127/2024-09-23-paws_x_xlm_r_only_spanish_en.md new file mode 100644 index 00000000000000..542738ef763009 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-paws_x_xlm_r_only_spanish_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English paws_x_xlm_r_only_spanish XlmRoBertaForSequenceClassification from semindan +author: John Snow Labs +name: paws_x_xlm_r_only_spanish +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`paws_x_xlm_r_only_spanish` is a English model originally trained by semindan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/paws_x_xlm_r_only_spanish_en_5.5.0_3.0_1727099329600.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/paws_x_xlm_r_only_spanish_en_5.5.0_3.0_1727099329600.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("paws_x_xlm_r_only_spanish","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("paws_x_xlm_r_only_spanish", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|paws_x_xlm_r_only_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|803.4 MB| + +## References + +https://huggingface.co/semindan/paws_x_xlm_r_only_es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-phowhisper_small_pipeline_vi.md b/docs/_posts/ahmedlone127/2024-09-23-phowhisper_small_pipeline_vi.md new file mode 100644 index 00000000000000..acdbfe675d005a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-phowhisper_small_pipeline_vi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Vietnamese phowhisper_small_pipeline pipeline WhisperForCTC from huuquyet +author: John Snow Labs +name: phowhisper_small_pipeline +date: 2024-09-23 +tags: [vi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: vi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phowhisper_small_pipeline` is a Vietnamese model originally trained by huuquyet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phowhisper_small_pipeline_vi_5.5.0_3.0_1727117104510.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phowhisper_small_pipeline_vi_5.5.0_3.0_1727117104510.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("phowhisper_small_pipeline", lang = "vi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("phowhisper_small_pipeline", lang = "vi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phowhisper_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|vi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/huuquyet/PhoWhisper-small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-phrase_bank_yeah_en.md b/docs/_posts/ahmedlone127/2024-09-23-phrase_bank_yeah_en.md new file mode 100644 index 00000000000000..91fb751d9c2c5c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-phrase_bank_yeah_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English phrase_bank_yeah DistilBertForSequenceClassification from EthanHosier +author: John Snow Labs +name: phrase_bank_yeah +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phrase_bank_yeah` is a English model originally trained by EthanHosier. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phrase_bank_yeah_en_5.5.0_3.0_1727096947022.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phrase_bank_yeah_en_5.5.0_3.0_1727096947022.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("phrase_bank_yeah","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("phrase_bank_yeah", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phrase_bank_yeah| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/EthanHosier/phrase_bank_yeah \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-platzi_distilroberta_base_glue_mrpc_eduardo_ag_en.md b/docs/_posts/ahmedlone127/2024-09-23-platzi_distilroberta_base_glue_mrpc_eduardo_ag_en.md new file mode 100644 index 00000000000000..7dc88b4cf69a78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-platzi_distilroberta_base_glue_mrpc_eduardo_ag_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English platzi_distilroberta_base_glue_mrpc_eduardo_ag RoBertaForSequenceClassification from platzi +author: John Snow Labs +name: platzi_distilroberta_base_glue_mrpc_eduardo_ag +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_glue_mrpc_eduardo_ag` is a English model originally trained by platzi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_glue_mrpc_eduardo_ag_en_5.5.0_3.0_1727085898144.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_glue_mrpc_eduardo_ag_en_5.5.0_3.0_1727085898144.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_glue_mrpc_eduardo_ag","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("platzi_distilroberta_base_glue_mrpc_eduardo_ag", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_glue_mrpc_eduardo_ag| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/platzi/platzi-distilroberta-base-glue-mrpc-eduardo-ag \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-portuguese_up_xlmr_truetrue_0_4_best_en.md b/docs/_posts/ahmedlone127/2024-09-23-portuguese_up_xlmr_truetrue_0_4_best_en.md new file mode 100644 index 00000000000000..ee6cc2d268a680 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-portuguese_up_xlmr_truetrue_0_4_best_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English portuguese_up_xlmr_truetrue_0_4_best XlmRoBertaForSequenceClassification from harish +author: John Snow Labs +name: portuguese_up_xlmr_truetrue_0_4_best +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`portuguese_up_xlmr_truetrue_0_4_best` is a English model originally trained by harish. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/portuguese_up_xlmr_truetrue_0_4_best_en_5.5.0_3.0_1727089422482.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/portuguese_up_xlmr_truetrue_0_4_best_en_5.5.0_3.0_1727089422482.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("portuguese_up_xlmr_truetrue_0_4_best","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("portuguese_up_xlmr_truetrue_0_4_best", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|portuguese_up_xlmr_truetrue_0_4_best| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|788.1 MB| + +## References + +https://huggingface.co/harish/PT-UP-xlmR-TrueTrue-0_4_BEST \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-qa_model_gigazinie_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-qa_model_gigazinie_pipeline_en.md new file mode 100644 index 00000000000000..6602a4f025ee73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-qa_model_gigazinie_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English qa_model_gigazinie_pipeline pipeline BertForQuestionAnswering from Gigazinie +author: John Snow Labs +name: qa_model_gigazinie_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_model_gigazinie_pipeline` is a English model originally trained by Gigazinie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_model_gigazinie_pipeline_en_5.5.0_3.0_1727070468565.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_model_gigazinie_pipeline_en_5.5.0_3.0_1727070468565.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("qa_model_gigazinie_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("qa_model_gigazinie_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_model_gigazinie_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Gigazinie/QA_model + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-question_answering_model_finetuned_on_bert_en.md b/docs/_posts/ahmedlone127/2024-09-23-question_answering_model_finetuned_on_bert_en.md new file mode 100644 index 00000000000000..60780cf865ec3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-question_answering_model_finetuned_on_bert_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English question_answering_model_finetuned_on_bert BertForQuestionAnswering from Vardan-verma +author: John Snow Labs +name: question_answering_model_finetuned_on_bert +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`question_answering_model_finetuned_on_bert` is a English model originally trained by Vardan-verma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/question_answering_model_finetuned_on_bert_en_5.5.0_3.0_1727049603017.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/question_answering_model_finetuned_on_bert_en_5.5.0_3.0_1727049603017.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("question_answering_model_finetuned_on_bert","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("question_answering_model_finetuned_on_bert", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|question_answering_model_finetuned_on_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|797.4 MB| + +## References + +https://huggingface.co/Vardan-verma/Question_Answering_model_finetuned_on_bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-question_answering_model_finetuned_on_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-question_answering_model_finetuned_on_bert_pipeline_en.md new file mode 100644 index 00000000000000..72b8d1d5eb2f46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-question_answering_model_finetuned_on_bert_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English question_answering_model_finetuned_on_bert_pipeline pipeline BertForQuestionAnswering from Vardan-verma +author: John Snow Labs +name: question_answering_model_finetuned_on_bert_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`question_answering_model_finetuned_on_bert_pipeline` is a English model originally trained by Vardan-verma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/question_answering_model_finetuned_on_bert_pipeline_en_5.5.0_3.0_1727049830318.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/question_answering_model_finetuned_on_bert_pipeline_en_5.5.0_3.0_1727049830318.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("question_answering_model_finetuned_on_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("question_answering_model_finetuned_on_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|question_answering_model_finetuned_on_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|797.4 MB| + +## References + +https://huggingface.co/Vardan-verma/Question_Answering_model_finetuned_on_bert + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-red_green_classification_v3_en.md b/docs/_posts/ahmedlone127/2024-09-23-red_green_classification_v3_en.md new file mode 100644 index 00000000000000..245df0fc3f88b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-red_green_classification_v3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English red_green_classification_v3 DistilBertForSequenceClassification from pnr-svc +author: John Snow Labs +name: red_green_classification_v3 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`red_green_classification_v3` is a English model originally trained by pnr-svc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/red_green_classification_v3_en_5.5.0_3.0_1727108644725.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/red_green_classification_v3_en_5.5.0_3.0_1727108644725.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("red_green_classification_v3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("red_green_classification_v3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|red_green_classification_v3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/pnr-svc/red-green-classification-v3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-results_lr_0_001_en.md b/docs/_posts/ahmedlone127/2024-09-23-results_lr_0_001_en.md new file mode 100644 index 00000000000000..0020eeea3adb0a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-results_lr_0_001_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English results_lr_0_001 DistilBertForSequenceClassification from Benuehlinger +author: John Snow Labs +name: results_lr_0_001 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`results_lr_0_001` is a English model originally trained by Benuehlinger. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/results_lr_0_001_en_5.5.0_3.0_1727073645417.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/results_lr_0_001_en_5.5.0_3.0_1727073645417.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("results_lr_0_001","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("results_lr_0_001", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|results_lr_0_001| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.4 MB| + +## References + +https://huggingface.co/Benuehlinger/results_lr_0.001 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-results_lr_0_001_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-results_lr_0_001_pipeline_en.md new file mode 100644 index 00000000000000..dcc3f90a959d5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-results_lr_0_001_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English results_lr_0_001_pipeline pipeline DistilBertForSequenceClassification from Benuehlinger +author: John Snow Labs +name: results_lr_0_001_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`results_lr_0_001_pipeline` is a English model originally trained by Benuehlinger. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/results_lr_0_001_pipeline_en_5.5.0_3.0_1727073658319.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/results_lr_0_001_pipeline_en_5.5.0_3.0_1727073658319.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("results_lr_0_001_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("results_lr_0_001_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|results_lr_0_001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Benuehlinger/results_lr_0.001 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-reviews_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-23-reviews_classifier_en.md new file mode 100644 index 00000000000000..45b5f258aec717 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-reviews_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English reviews_classifier RoBertaForSequenceClassification from dariadaria +author: John Snow Labs +name: reviews_classifier +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`reviews_classifier` is a English model originally trained by dariadaria. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/reviews_classifier_en_5.5.0_3.0_1727085667042.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/reviews_classifier_en_5.5.0_3.0_1727085667042.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("reviews_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("reviews_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|reviews_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/dariadaria/reviews_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-reviews_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-reviews_classifier_pipeline_en.md new file mode 100644 index 00000000000000..46aea56fb3f588 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-reviews_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English reviews_classifier_pipeline pipeline RoBertaForSequenceClassification from dariadaria +author: John Snow Labs +name: reviews_classifier_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`reviews_classifier_pipeline` is a English model originally trained by dariadaria. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/reviews_classifier_pipeline_en_5.5.0_3.0_1727085690760.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/reviews_classifier_pipeline_en_5.5.0_3.0_1727085690760.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("reviews_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("reviews_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|reviews_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/dariadaria/reviews_classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-robbert_2023_dutch_base_abb_nl.md b/docs/_posts/ahmedlone127/2024-09-23-robbert_2023_dutch_base_abb_nl.md new file mode 100644 index 00000000000000..0b8b92d8da932d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-robbert_2023_dutch_base_abb_nl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Dutch, Flemish robbert_2023_dutch_base_abb RoBertaEmbeddings from svercoutere +author: John Snow Labs +name: robbert_2023_dutch_base_abb +date: 2024-09-23 +tags: [nl, open_source, onnx, embeddings, roberta] +task: Embeddings +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_2023_dutch_base_abb` is a Dutch, Flemish model originally trained by svercoutere. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_2023_dutch_base_abb_nl_5.5.0_3.0_1727065849629.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_2023_dutch_base_abb_nl_5.5.0_3.0_1727065849629.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("robbert_2023_dutch_base_abb","nl") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("robbert_2023_dutch_base_abb","nl") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_2023_dutch_base_abb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|nl| +|Size:|464.8 MB| + +## References + +https://huggingface.co/svercoutere/robbert-2023-dutch-base-abb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-robbert_2023_dutch_base_abb_pipeline_nl.md b/docs/_posts/ahmedlone127/2024-09-23-robbert_2023_dutch_base_abb_pipeline_nl.md new file mode 100644 index 00000000000000..20e2b5c1f01704 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-robbert_2023_dutch_base_abb_pipeline_nl.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Dutch, Flemish robbert_2023_dutch_base_abb_pipeline pipeline RoBertaEmbeddings from svercoutere +author: John Snow Labs +name: robbert_2023_dutch_base_abb_pipeline +date: 2024-09-23 +tags: [nl, open_source, pipeline, onnx] +task: Embeddings +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_2023_dutch_base_abb_pipeline` is a Dutch, Flemish model originally trained by svercoutere. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_2023_dutch_base_abb_pipeline_nl_5.5.0_3.0_1727065872429.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_2023_dutch_base_abb_pipeline_nl_5.5.0_3.0_1727065872429.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robbert_2023_dutch_base_abb_pipeline", lang = "nl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robbert_2023_dutch_base_abb_pipeline", lang = "nl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_2023_dutch_base_abb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|nl| +|Size:|464.8 MB| + +## References + +https://huggingface.co/svercoutere/robbert-2023-dutch-base-abb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_5_epochs_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_5_epochs_en.md new file mode 100644 index 00000000000000..d4c955a42c3253 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_5_epochs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_5_epochs RoBertaEmbeddings from mingAAA +author: John Snow Labs +name: roberta_base_5_epochs +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_5_epochs` is a English model originally trained by mingAAA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_5_epochs_en_5.5.0_3.0_1727056655328.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_5_epochs_en_5.5.0_3.0_1727056655328.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_5_epochs","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_5_epochs","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_5_epochs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|298.2 MB| + +## References + +https://huggingface.co/mingAAA/roberta-base-5-epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_5_epochs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_5_epochs_pipeline_en.md new file mode 100644 index 00000000000000..e01a4bdd2ace8e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_5_epochs_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_5_epochs_pipeline pipeline RoBertaEmbeddings from mingAAA +author: John Snow Labs +name: roberta_base_5_epochs_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_5_epochs_pipeline` is a English model originally trained by mingAAA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_5_epochs_pipeline_en_5.5.0_3.0_1727056738251.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_5_epochs_pipeline_en_5.5.0_3.0_1727056738251.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_5_epochs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_5_epochs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_5_epochs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|298.2 MB| + +## References + +https://huggingface.co/mingAAA/roberta-base-5-epochs + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_en.md new file mode 100644 index 00000000000000..843292f4d7f53d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad RoBertaForSequenceClassification from vg055 +author: John Snow Labs +name: roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad` is a English model originally trained by vg055. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_en_5.5.0_3.0_1727055061144.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_en_5.5.0_3.0_1727055061144.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/vg055/roberta-base-bne-finetuned-TripAdvisorDomainAdaptation-finetuned-e2-RestMex2023-polaridad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_pipeline_en.md new file mode 100644 index 00000000000000..11d2e6cb6f2a39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_pipeline pipeline RoBertaForSequenceClassification from vg055 +author: John Snow Labs +name: roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_pipeline` is a English model originally trained by vg055. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_pipeline_en_5.5.0_3.0_1727055083718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_pipeline_en_5.5.0_3.0_1727055083718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_finetuned_tripadvisordomainadaptation_finetuned_e2_restmex2023_polaridad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/vg055/roberta-base-bne-finetuned-TripAdvisorDomainAdaptation-finetuned-e2-RestMex2023-polaridad + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_26_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_26_en.md new file mode 100644 index 00000000000000..3c841b249f2219 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_26_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_26 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_26 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_26` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_26_en_5.5.0_3.0_1727057123268.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_26_en_5.5.0_3.0_1727057123268.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_26","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_26","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_26| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_26 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_26_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_26_pipeline_en.md new file mode 100644 index 00000000000000..2d93559b863e0e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_26_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_26_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_26_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_26_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_26_pipeline_en_5.5.0_3.0_1727057203348.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_26_pipeline_en_5.5.0_3.0_1727057203348.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_26_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_26_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_26_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_26 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_36_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_36_en.md new file mode 100644 index 00000000000000..f5201ab47cbd99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_36_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_36 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_36 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_36` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_36_en_5.5.0_3.0_1727080612388.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_36_en_5.5.0_3.0_1727080612388.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_36","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_36","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_36| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_36 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_36_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_36_pipeline_en.md new file mode 100644 index 00000000000000..420ec41e82953f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_36_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_36_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_36_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_36_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_36_pipeline_en_5.5.0_3.0_1727080698172.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_36_pipeline_en_5.5.0_3.0_1727080698172.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_36_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_36_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_36_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_36 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_45_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_45_pipeline_en.md new file mode 100644 index 00000000000000..b2ae5c6a659830 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_45_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_45_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_45_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_45_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_45_pipeline_en_5.5.0_3.0_1727056872118.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_45_pipeline_en_5.5.0_3.0_1727056872118.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_45_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_45_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_45_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_45 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_manual_6ep_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_manual_6ep_en.md new file mode 100644 index 00000000000000..84f923e93fe5df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_manual_6ep_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_manual_6ep RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_manual_6ep +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_manual_6ep` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_6ep_en_5.5.0_3.0_1727066025615.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_6ep_en_5.5.0_3.0_1727066025615.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_wallisian_manual_6ep","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_wallisian_manual_6ep","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_manual_6ep| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.0 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-manual-6ep \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_manual_6ep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_manual_6ep_pipeline_en.md new file mode 100644 index 00000000000000..30efe62fd3831a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_manual_6ep_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_manual_6ep_pipeline pipeline RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_manual_6ep_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_manual_6ep_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_6ep_pipeline_en_5.5.0_3.0_1727066048873.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_6ep_pipeline_en_5.5.0_3.0_1727066048873.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_wallisian_manual_6ep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_wallisian_manual_6ep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_manual_6ep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.0 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-manual-6ep + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_manual_7ep_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_manual_7ep_en.md new file mode 100644 index 00000000000000..40777cde5a5887 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_manual_7ep_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_manual_7ep RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_manual_7ep +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_manual_7ep` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_7ep_en_5.5.0_3.0_1727092281288.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_7ep_en_5.5.0_3.0_1727092281288.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_wallisian_manual_7ep","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_wallisian_manual_7ep","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_manual_7ep| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.0 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-manual-7ep \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_whisper_9ep_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_whisper_9ep_en.md new file mode 100644 index 00000000000000..6896f5ff5c0628 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_whisper_9ep_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_whisper_9ep RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_whisper_9ep +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_whisper_9ep` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_whisper_9ep_en_5.5.0_3.0_1727092137100.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_whisper_9ep_en_5.5.0_3.0_1727092137100.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_wallisian_whisper_9ep","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_finetuned_wallisian_whisper_9ep","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_whisper_9ep| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-whisper-9ep \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetunedlabelsmooting_ner_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetunedlabelsmooting_ner_en.md new file mode 100644 index 00000000000000..8a9f5d3cd2d30a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetunedlabelsmooting_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetunedlabelsmooting_ner RoBertaForTokenClassification from lobrien001 +author: John Snow Labs +name: roberta_base_finetunedlabelsmooting_ner +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetunedlabelsmooting_ner` is a English model originally trained by lobrien001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetunedlabelsmooting_ner_en_5.5.0_3.0_1727081796172.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetunedlabelsmooting_ner_en_5.5.0_3.0_1727081796172.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_finetunedlabelsmooting_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_finetunedlabelsmooting_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetunedlabelsmooting_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|359.3 MB| + +## References + +https://huggingface.co/lobrien001/roberta-base-finetunedLabelSmooting-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetunedlabelsmooting_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetunedlabelsmooting_ner_pipeline_en.md new file mode 100644 index 00000000000000..54b39402c5d03d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetunedlabelsmooting_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetunedlabelsmooting_ner_pipeline pipeline RoBertaForTokenClassification from lobrien001 +author: John Snow Labs +name: roberta_base_finetunedlabelsmooting_ner_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetunedlabelsmooting_ner_pipeline` is a English model originally trained by lobrien001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetunedlabelsmooting_ner_pipeline_en_5.5.0_3.0_1727081861060.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetunedlabelsmooting_ner_pipeline_en_5.5.0_3.0_1727081861060.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetunedlabelsmooting_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetunedlabelsmooting_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetunedlabelsmooting_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|359.3 MB| + +## References + +https://huggingface.co/lobrien001/roberta-base-finetunedLabelSmooting-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_hoax_classifier_defs_1h100r_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_hoax_classifier_defs_1h100r_en.md new file mode 100644 index 00000000000000..f0a3d69b826b28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_hoax_classifier_defs_1h100r_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_hoax_classifier_defs_1h100r RoBertaForSequenceClassification from research-dump +author: John Snow Labs +name: roberta_base_hoax_classifier_defs_1h100r +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_hoax_classifier_defs_1h100r` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_hoax_classifier_defs_1h100r_en_5.5.0_3.0_1727085791843.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_hoax_classifier_defs_1h100r_en_5.5.0_3.0_1727085791843.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_hoax_classifier_defs_1h100r","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_hoax_classifier_defs_1h100r", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_hoax_classifier_defs_1h100r| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|451.1 MB| + +## References + +https://huggingface.co/research-dump/roberta-base_hoax_classifier_defs_1h100r \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_hoax_classifier_defs_1h100r_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_hoax_classifier_defs_1h100r_pipeline_en.md new file mode 100644 index 00000000000000..ccefed147906a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_hoax_classifier_defs_1h100r_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_hoax_classifier_defs_1h100r_pipeline pipeline RoBertaForSequenceClassification from research-dump +author: John Snow Labs +name: roberta_base_hoax_classifier_defs_1h100r_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_hoax_classifier_defs_1h100r_pipeline` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_hoax_classifier_defs_1h100r_pipeline_en_5.5.0_3.0_1727085818982.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_hoax_classifier_defs_1h100r_pipeline_en_5.5.0_3.0_1727085818982.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_hoax_classifier_defs_1h100r_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_hoax_classifier_defs_1h100r_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_hoax_classifier_defs_1h100r_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|451.1 MB| + +## References + +https://huggingface.co/research-dump/roberta-base_hoax_classifier_defs_1h100r + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_s2d_saved_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_s2d_saved_en.md new file mode 100644 index 00000000000000..5cdafe5f72b64a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_s2d_saved_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_s2d_saved RoBertaForSequenceClassification from thaile +author: John Snow Labs +name: roberta_base_s2d_saved +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_s2d_saved` is a English model originally trained by thaile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_s2d_saved_en_5.5.0_3.0_1727085687280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_s2d_saved_en_5.5.0_3.0_1727085687280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_s2d_saved","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_s2d_saved", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_s2d_saved| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|427.0 MB| + +## References + +https://huggingface.co/thaile/roberta-base-s2d-saved \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_s2d_saved_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_s2d_saved_pipeline_en.md new file mode 100644 index 00000000000000..d58ee63bd8499c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_s2d_saved_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_s2d_saved_pipeline pipeline RoBertaForSequenceClassification from thaile +author: John Snow Labs +name: roberta_base_s2d_saved_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_s2d_saved_pipeline` is a English model originally trained by thaile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_s2d_saved_pipeline_en_5.5.0_3.0_1727085715448.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_s2d_saved_pipeline_en_5.5.0_3.0_1727085715448.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_s2d_saved_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_s2d_saved_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_s2d_saved_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|427.1 MB| + +## References + +https://huggingface.co/thaile/roberta-base-s2d-saved + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_university_writing2_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_university_writing2_en.md new file mode 100644 index 00000000000000..a4e3be82ec174f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_university_writing2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_university_writing2 RoBertaEmbeddings from egumasa +author: John Snow Labs +name: roberta_base_university_writing2 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_university_writing2` is a English model originally trained by egumasa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_university_writing2_en_5.5.0_3.0_1727080448557.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_university_writing2_en_5.5.0_3.0_1727080448557.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_university_writing2","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_university_writing2","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_university_writing2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/egumasa/roberta-base-university-writing2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_university_writing2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_university_writing2_pipeline_en.md new file mode 100644 index 00000000000000..acf8fbae17a855 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_university_writing2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_university_writing2_pipeline pipeline RoBertaEmbeddings from egumasa +author: John Snow Labs +name: roberta_base_university_writing2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_university_writing2_pipeline` is a English model originally trained by egumasa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_university_writing2_pipeline_en_5.5.0_3.0_1727080470581.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_university_writing2_pipeline_en_5.5.0_3.0_1727080470581.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_university_writing2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_university_writing2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_university_writing2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/egumasa/roberta-base-university-writing2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_20_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_20_en.md new file mode 100644 index 00000000000000..f8e60dbe67b625 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_20_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_finetuned_20 RoBertaEmbeddings from yzxjb +author: John Snow Labs +name: roberta_finetuned_20 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_20` is a English model originally trained by yzxjb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_20_en_5.5.0_3.0_1727056993808.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_20_en_5.5.0_3.0_1727056993808.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_finetuned_20","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_finetuned_20","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_20| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/yzxjb/roberta-finetuned-20 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_20_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_20_pipeline_en.md new file mode 100644 index 00000000000000..6c3b29b2a13d4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_20_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_finetuned_20_pipeline pipeline RoBertaEmbeddings from yzxjb +author: John Snow Labs +name: roberta_finetuned_20_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_20_pipeline` is a English model originally trained by yzxjb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_20_pipeline_en_5.5.0_3.0_1727057015874.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_20_pipeline_en_5.5.0_3.0_1727057015874.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_finetuned_20_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_finetuned_20_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_20_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/yzxjb/roberta-finetuned-20 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_finetuned_ner_hypnosis0930_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_finetuned_ner_hypnosis0930_en.md new file mode 100644 index 00000000000000..60a16f735b3f70 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_finetuned_ner_hypnosis0930_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_finetuned_ner_hypnosis0930 RoBertaForTokenClassification from hypnosis0930 +author: John Snow Labs +name: roberta_large_finetuned_ner_hypnosis0930 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_ner_hypnosis0930` is a English model originally trained by hypnosis0930. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_ner_hypnosis0930_en_5.5.0_3.0_1727072856735.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_ner_hypnosis0930_en_5.5.0_3.0_1727072856735.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_finetuned_ner_hypnosis0930","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_finetuned_ner_hypnosis0930", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_ner_hypnosis0930| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/hypnosis0930/roberta-large-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_finetuned_ner_hypnosis0930_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_finetuned_ner_hypnosis0930_pipeline_en.md new file mode 100644 index 00000000000000..d3021effea35f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_finetuned_ner_hypnosis0930_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_finetuned_ner_hypnosis0930_pipeline pipeline RoBertaForTokenClassification from hypnosis0930 +author: John Snow Labs +name: roberta_large_finetuned_ner_hypnosis0930_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_ner_hypnosis0930_pipeline` is a English model originally trained by hypnosis0930. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_ner_hypnosis0930_pipeline_en_5.5.0_3.0_1727072925612.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_ner_hypnosis0930_pipeline_en_5.5.0_3.0_1727072925612.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_finetuned_ner_hypnosis0930_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_finetuned_ner_hypnosis0930_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_ner_hypnosis0930_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/hypnosis0930/roberta-large-finetuned-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_gd1_v1_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_gd1_v1_en.md new file mode 100644 index 00000000000000..e078d0494fd5e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_gd1_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_gd1_v1 RoBertaForSequenceClassification from ericNguyen0132 +author: John Snow Labs +name: roberta_large_gd1_v1 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_gd1_v1` is a English model originally trained by ericNguyen0132. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_gd1_v1_en_5.5.0_3.0_1727054960782.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_gd1_v1_en_5.5.0_3.0_1727054960782.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_gd1_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_gd1_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_gd1_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ericNguyen0132/RoBERTa-large-GD1-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_gd1_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_gd1_v1_pipeline_en.md new file mode 100644 index 00000000000000..be9c280d72515e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_gd1_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_gd1_v1_pipeline pipeline RoBertaForSequenceClassification from ericNguyen0132 +author: John Snow Labs +name: roberta_large_gd1_v1_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_gd1_v1_pipeline` is a English model originally trained by ericNguyen0132. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_gd1_v1_pipeline_en_5.5.0_3.0_1727055025719.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_gd1_v1_pipeline_en_5.5.0_3.0_1727055025719.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_gd1_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_gd1_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_gd1_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ericNguyen0132/RoBERTa-large-GD1-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_pm_m3_voc_hf_finetuned_ner_v2_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_pm_m3_voc_hf_finetuned_ner_v2_en.md new file mode 100644 index 00000000000000..46ed0be6ccd1fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_pm_m3_voc_hf_finetuned_ner_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_pm_m3_voc_hf_finetuned_ner_v2 RoBertaForTokenClassification from ktgiahieu +author: John Snow Labs +name: roberta_large_pm_m3_voc_hf_finetuned_ner_v2 +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_pm_m3_voc_hf_finetuned_ner_v2` is a English model originally trained by ktgiahieu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_pm_m3_voc_hf_finetuned_ner_v2_en_5.5.0_3.0_1727081817052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_pm_m3_voc_hf_finetuned_ner_v2_en_5.5.0_3.0_1727081817052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_pm_m3_voc_hf_finetuned_ner_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_pm_m3_voc_hf_finetuned_ner_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_pm_m3_voc_hf_finetuned_ner_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ktgiahieu/RoBERTa-large-PM-M3-Voc-hf-finetuned-ner-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_pm_m3_voc_hf_finetuned_ner_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_pm_m3_voc_hf_finetuned_ner_v2_pipeline_en.md new file mode 100644 index 00000000000000..4c0b776b8464ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_pm_m3_voc_hf_finetuned_ner_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_pm_m3_voc_hf_finetuned_ner_v2_pipeline pipeline RoBertaForTokenClassification from ktgiahieu +author: John Snow Labs +name: roberta_large_pm_m3_voc_hf_finetuned_ner_v2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_pm_m3_voc_hf_finetuned_ner_v2_pipeline` is a English model originally trained by ktgiahieu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_pm_m3_voc_hf_finetuned_ner_v2_pipeline_en_5.5.0_3.0_1727081884133.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_pm_m3_voc_hf_finetuned_ner_v2_pipeline_en_5.5.0_3.0_1727081884133.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_pm_m3_voc_hf_finetuned_ner_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_pm_m3_voc_hf_finetuned_ner_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_pm_m3_voc_hf_finetuned_ner_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ktgiahieu/RoBERTa-large-PM-M3-Voc-hf-finetuned-ner-v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_nba_v2_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_nba_v2_en.md new file mode 100644 index 00000000000000..89f73c54aa3bd1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_nba_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_nba_v2 RoBertaForSequenceClassification from sivakarri +author: John Snow Labs +name: roberta_nba_v2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_nba_v2` is a English model originally trained by sivakarri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_nba_v2_en_5.5.0_3.0_1727055315012.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_nba_v2_en_5.5.0_3.0_1727055315012.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_nba_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_nba_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_nba_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|430.7 MB| + +## References + +https://huggingface.co/sivakarri/roberta_nba_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_ner_omvibhandik_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_ner_omvibhandik_en.md new file mode 100644 index 00000000000000..7fc51a4d4400eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_ner_omvibhandik_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_ner_omvibhandik RoBertaForTokenClassification from OmVibhandik +author: John Snow Labs +name: roberta_ner_omvibhandik +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_ner_omvibhandik` is a English model originally trained by OmVibhandik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_ner_omvibhandik_en_5.5.0_3.0_1727081347076.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_ner_omvibhandik_en_5.5.0_3.0_1727081347076.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_ner_omvibhandik","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_ner_omvibhandik", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_ner_omvibhandik| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|429.4 MB| + +## References + +https://huggingface.co/OmVibhandik/roBERTa_NER \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_ner_omvibhandik_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_ner_omvibhandik_pipeline_en.md new file mode 100644 index 00000000000000..ef315ec209368f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_ner_omvibhandik_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_ner_omvibhandik_pipeline pipeline RoBertaForTokenClassification from OmVibhandik +author: John Snow Labs +name: roberta_ner_omvibhandik_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_ner_omvibhandik_pipeline` is a English model originally trained by OmVibhandik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_ner_omvibhandik_pipeline_en_5.5.0_3.0_1727081378852.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_ner_omvibhandik_pipeline_en_5.5.0_3.0_1727081378852.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_ner_omvibhandik_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_ner_omvibhandik_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_ner_omvibhandik_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|429.4 MB| + +## References + +https://huggingface.co/OmVibhandik/roBERTa_NER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_tagalog_base_ft_udpos213_romanian_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_tagalog_base_ft_udpos213_romanian_en.md new file mode 100644 index 00000000000000..ea7d39bddfea47 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_tagalog_base_ft_udpos213_romanian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_tagalog_base_ft_udpos213_romanian RoBertaForTokenClassification from hellojimson +author: John Snow Labs +name: roberta_tagalog_base_ft_udpos213_romanian +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tagalog_base_ft_udpos213_romanian` is a English model originally trained by hellojimson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_romanian_en_5.5.0_3.0_1727081261016.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_romanian_en_5.5.0_3.0_1727081261016.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_tagalog_base_ft_udpos213_romanian","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_tagalog_base_ft_udpos213_romanian", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tagalog_base_ft_udpos213_romanian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/hellojimson/roberta-tagalog-base-ft-udpos213-ro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_tagalog_base_ft_udpos213_romanian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_tagalog_base_ft_udpos213_romanian_pipeline_en.md new file mode 100644 index 00000000000000..249d11deb04039 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_tagalog_base_ft_udpos213_romanian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_tagalog_base_ft_udpos213_romanian_pipeline pipeline RoBertaForTokenClassification from hellojimson +author: John Snow Labs +name: roberta_tagalog_base_ft_udpos213_romanian_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tagalog_base_ft_udpos213_romanian_pipeline` is a English model originally trained by hellojimson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_romanian_pipeline_en_5.5.0_3.0_1727081282609.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_romanian_pipeline_en_5.5.0_3.0_1727081282609.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_tagalog_base_ft_udpos213_romanian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_tagalog_base_ft_udpos213_romanian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tagalog_base_ft_udpos213_romanian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/hellojimson/roberta-tagalog-base-ft-udpos213-ro + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_tagalog_base_ft_udpos213_top4lang_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_tagalog_base_ft_udpos213_top4lang_en.md new file mode 100644 index 00000000000000..38eb6838102e57 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_tagalog_base_ft_udpos213_top4lang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_tagalog_base_ft_udpos213_top4lang RoBertaForTokenClassification from katrinatan +author: John Snow Labs +name: roberta_tagalog_base_ft_udpos213_top4lang +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tagalog_base_ft_udpos213_top4lang` is a English model originally trained by katrinatan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_top4lang_en_5.5.0_3.0_1727081261010.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_top4lang_en_5.5.0_3.0_1727081261010.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_tagalog_base_ft_udpos213_top4lang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_tagalog_base_ft_udpos213_top4lang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tagalog_base_ft_udpos213_top4lang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/katrinatan/roberta-tagalog-base-ft-udpos213-top4lang \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_tagalog_base_ft_udpos213_top4lang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_tagalog_base_ft_udpos213_top4lang_pipeline_en.md new file mode 100644 index 00000000000000..176120a060987f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_tagalog_base_ft_udpos213_top4lang_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_tagalog_base_ft_udpos213_top4lang_pipeline pipeline RoBertaForTokenClassification from katrinatan +author: John Snow Labs +name: roberta_tagalog_base_ft_udpos213_top4lang_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tagalog_base_ft_udpos213_top4lang_pipeline` is a English model originally trained by katrinatan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_top4lang_pipeline_en_5.5.0_3.0_1727081282876.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tagalog_base_ft_udpos213_top4lang_pipeline_en_5.5.0_3.0_1727081282876.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_tagalog_base_ft_udpos213_top4lang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_tagalog_base_ft_udpos213_top4lang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tagalog_base_ft_udpos213_top4lang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/katrinatan/roberta-tagalog-base-ft-udpos213-top4lang + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_tomasz_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_tomasz_en.md new file mode 100644 index 00000000000000..818933759c0dcf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_tomasz_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_tomasz RoBertaEmbeddings from Tomasz +author: John Snow Labs +name: roberta_tomasz +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tomasz` is a English model originally trained by Tomasz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tomasz_en_5.5.0_3.0_1727080784279.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tomasz_en_5.5.0_3.0_1727080784279.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_tomasz","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_tomasz","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tomasz| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|845.7 MB| + +## References + +https://huggingface.co/Tomasz/roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_tomasz_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_tomasz_pipeline_en.md new file mode 100644 index 00000000000000..806a4ef0d551c8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_tomasz_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_tomasz_pipeline pipeline RoBertaEmbeddings from Tomasz +author: John Snow Labs +name: roberta_tomasz_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_tomasz_pipeline` is a English model originally trained by Tomasz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_tomasz_pipeline_en_5.5.0_3.0_1727081025723.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_tomasz_pipeline_en_5.5.0_3.0_1727081025723.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_tomasz_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_tomasz_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_tomasz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|845.7 MB| + +## References + +https://huggingface.co/Tomasz/roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_wiki_english_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_wiki_english_en.md new file mode 100644 index 00000000000000..1daa4c4f317b6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_wiki_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_wiki_english RoBertaEmbeddings from juanquivilla +author: John Snow Labs +name: roberta_wiki_english +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_wiki_english` is a English model originally trained by juanquivilla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_wiki_english_en_5.5.0_3.0_1727066083725.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_wiki_english_en_5.5.0_3.0_1727066083725.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_wiki_english","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_wiki_english","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_wiki_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/juanquivilla/roberta-wiki-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_wiki_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_wiki_english_pipeline_en.md new file mode 100644 index 00000000000000..18c3f73e903380 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_wiki_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_wiki_english_pipeline pipeline RoBertaEmbeddings from juanquivilla +author: John Snow Labs +name: roberta_wiki_english_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_wiki_english_pipeline` is a English model originally trained by juanquivilla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_wiki_english_pipeline_en_5.5.0_3.0_1727066105583.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_wiki_english_pipeline_en_5.5.0_3.0_1727066105583.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_wiki_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_wiki_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_wiki_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/juanquivilla/roberta-wiki-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-robertvarcs_10pc_en.md b/docs/_posts/ahmedlone127/2024-09-23-robertvarcs_10pc_en.md new file mode 100644 index 00000000000000..cb708ab7e5a2cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-robertvarcs_10pc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robertvarcs_10pc RoBertaEmbeddings from gnathoi +author: John Snow Labs +name: robertvarcs_10pc +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertvarcs_10pc` is a English model originally trained by gnathoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertvarcs_10pc_en_5.5.0_3.0_1727080857649.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertvarcs_10pc_en_5.5.0_3.0_1727080857649.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("robertvarcs_10pc","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("robertvarcs_10pc","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertvarcs_10pc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|298.2 MB| + +## References + +https://huggingface.co/gnathoi/RoBERTvarCS_10pc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-robertvarcs_10pc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-robertvarcs_10pc_pipeline_en.md new file mode 100644 index 00000000000000..328a095e43b0d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-robertvarcs_10pc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robertvarcs_10pc_pipeline pipeline RoBertaEmbeddings from gnathoi +author: John Snow Labs +name: robertvarcs_10pc_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertvarcs_10pc_pipeline` is a English model originally trained by gnathoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertvarcs_10pc_pipeline_en_5.5.0_3.0_1727080941881.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertvarcs_10pc_pipeline_en_5.5.0_3.0_1727080941881.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robertvarcs_10pc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robertvarcs_10pc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertvarcs_10pc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|298.2 MB| + +## References + +https://huggingface.co/gnathoi/RoBERTvarCS_10pc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roboust_nlp_xlmr_en.md b/docs/_posts/ahmedlone127/2024-09-23-roboust_nlp_xlmr_en.md new file mode 100644 index 00000000000000..988459b1d8ddd9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roboust_nlp_xlmr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roboust_nlp_xlmr XlmRoBertaEmbeddings from Blue7Bird +author: John Snow Labs +name: roboust_nlp_xlmr +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, xlm_roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roboust_nlp_xlmr` is a English model originally trained by Blue7Bird. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roboust_nlp_xlmr_en_5.5.0_3.0_1727071750749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roboust_nlp_xlmr_en_5.5.0_3.0_1727071750749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = XlmRoBertaEmbeddings.pretrained("roboust_nlp_xlmr","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = XlmRoBertaEmbeddings.pretrained("roboust_nlp_xlmr","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roboust_nlp_xlmr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[xlm_roberta]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Blue7Bird/Roboust_nlp_xlmr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roboust_nlp_xlmr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roboust_nlp_xlmr_pipeline_en.md new file mode 100644 index 00000000000000..6b8479ffd6b9a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roboust_nlp_xlmr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roboust_nlp_xlmr_pipeline pipeline XlmRoBertaEmbeddings from Blue7Bird +author: John Snow Labs +name: roboust_nlp_xlmr_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roboust_nlp_xlmr_pipeline` is a English model originally trained by Blue7Bird. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roboust_nlp_xlmr_pipeline_en_5.5.0_3.0_1727071802512.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roboust_nlp_xlmr_pipeline_en_5.5.0_3.0_1727071802512.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roboust_nlp_xlmr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roboust_nlp_xlmr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roboust_nlp_xlmr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Blue7Bird/Roboust_nlp_xlmr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ruroberta_large_neg_en.md b/docs/_posts/ahmedlone127/2024-09-23-ruroberta_large_neg_en.md new file mode 100644 index 00000000000000..39ae1f0547a179 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ruroberta_large_neg_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ruroberta_large_neg RoBertaForTokenClassification from DimasikKurd +author: John Snow Labs +name: ruroberta_large_neg +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ruroberta_large_neg` is a English model originally trained by DimasikKurd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ruroberta_large_neg_en_5.5.0_3.0_1727072654278.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ruroberta_large_neg_en_5.5.0_3.0_1727072654278.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("ruroberta_large_neg","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("ruroberta_large_neg", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ruroberta_large_neg| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/DimasikKurd/ruRoberta-large_neg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-s_ohm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-s_ohm_pipeline_en.md new file mode 100644 index 00000000000000..61c37d824d01af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-s_ohm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English s_ohm_pipeline pipeline RoBertaEmbeddings from anandohm +author: John Snow Labs +name: s_ohm_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`s_ohm_pipeline` is a English model originally trained by anandohm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/s_ohm_pipeline_en_5.5.0_3.0_1727091895560.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/s_ohm_pipeline_en_5.5.0_3.0_1727091895560.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("s_ohm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("s_ohm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|s_ohm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|310.1 MB| + +## References + +https://huggingface.co/anandohm/S_ohm + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_en.md b/docs/_posts/ahmedlone127/2024-09-23-scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_en.md new file mode 100644 index 00000000000000..1e4ca78e19dded --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma XlmRoBertaForSequenceClassification from haryoaw +author: John Snow Labs +name: scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma` is a English model originally trained by haryoaw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_en_5.5.0_3.0_1727099539900.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_en_5.5.0_3.0_1727099539900.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|801.7 MB| + +## References + +https://huggingface.co/haryoaw/scenario-NON-KD-PO-COPY-D2_data-AmazonScience_massive_all_1_1_gamma \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_pipeline_en.md new file mode 100644 index 00000000000000..95429e74af41cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_pipeline pipeline XlmRoBertaForSequenceClassification from haryoaw +author: John Snow Labs +name: scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_pipeline` is a English model originally trained by haryoaw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_pipeline_en_5.5.0_3.0_1727099587307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_pipeline_en_5.5.0_3.0_1727099587307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scenario_non_kd_po_copy_d2_data_amazonscience_massive_all_1_1_gamma_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|801.7 MB| + +## References + +https://huggingface.co/haryoaw/scenario-NON-KD-PO-COPY-D2_data-AmazonScience_massive_all_1_1_gamma + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-23-scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_pipeline_xx.md new file mode 100644 index 00000000000000..9e3ea0c97eb569 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_pipeline pipeline XlmRoBertaForSequenceClassification from haryoaw +author: John Snow Labs +name: scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_pipeline +date: 2024-09-23 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_pipeline` is a Multilingual model originally trained by haryoaw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_pipeline_xx_5.5.0_3.0_1727089013755.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_pipeline_xx_5.5.0_3.0_1727089013755.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|883.8 MB| + +## References + +https://huggingface.co/haryoaw/scenario-NON-KD-SCR-D2_data-cardiffnlp_tweet_sentiment_multilingual_all2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_xx.md b/docs/_posts/ahmedlone127/2024-09-23-scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_xx.md new file mode 100644 index 00000000000000..ec1da4abb0ed42 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2 XlmRoBertaForSequenceClassification from haryoaw +author: John Snow Labs +name: scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2 +date: 2024-09-23 +tags: [xx, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2` is a Multilingual model originally trained by haryoaw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_xx_5.5.0_3.0_1727088971728.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2_xx_5.5.0_3.0_1727088971728.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|scenario_non_kd_scr_d2_data_cardiffnlp_tweet_sentiment_multilingual_all2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|883.7 MB| + +## References + +https://huggingface.co/haryoaw/scenario-NON-KD-SCR-D2_data-cardiffnlp_tweet_sentiment_multilingual_all2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-secondo_modello_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-secondo_modello_pipeline_en.md new file mode 100644 index 00000000000000..7964fd1acf4582 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-secondo_modello_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English secondo_modello_pipeline pipeline DistilBertForSequenceClassification from soniarocca31 +author: John Snow Labs +name: secondo_modello_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`secondo_modello_pipeline` is a English model originally trained by soniarocca31. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/secondo_modello_pipeline_en_5.5.0_3.0_1727074181815.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/secondo_modello_pipeline_en_5.5.0_3.0_1727074181815.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("secondo_modello_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("secondo_modello_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|secondo_modello_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/soniarocca31/secondo_modello + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sector_multilabel_climatebert_f_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sector_multilabel_climatebert_f_pipeline_en.md new file mode 100644 index 00000000000000..711f1e1eb518f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sector_multilabel_climatebert_f_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sector_multilabel_climatebert_f_pipeline pipeline RoBertaForSequenceClassification from GIZ +author: John Snow Labs +name: sector_multilabel_climatebert_f_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sector_multilabel_climatebert_f_pipeline` is a English model originally trained by GIZ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sector_multilabel_climatebert_f_pipeline_en_5.5.0_3.0_1727085931657.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sector_multilabel_climatebert_f_pipeline_en_5.5.0_3.0_1727085931657.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sector_multilabel_climatebert_f_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sector_multilabel_climatebert_f_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sector_multilabel_climatebert_f_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.7 MB| + +## References + +https://huggingface.co/GIZ/SECTOR-multilabel-climatebert_f + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-securebert_finetuned_autoisac_en.md b/docs/_posts/ahmedlone127/2024-09-23-securebert_finetuned_autoisac_en.md new file mode 100644 index 00000000000000..dfd64313e38da0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-securebert_finetuned_autoisac_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English securebert_finetuned_autoisac RoBertaEmbeddings from frankharman +author: John Snow Labs +name: securebert_finetuned_autoisac +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`securebert_finetuned_autoisac` is a English model originally trained by frankharman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/securebert_finetuned_autoisac_en_5.5.0_3.0_1727080870169.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/securebert_finetuned_autoisac_en_5.5.0_3.0_1727080870169.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("securebert_finetuned_autoisac","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("securebert_finetuned_autoisac","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|securebert_finetuned_autoisac| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/frankharman/securebert-finetuned-autoisac \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-securebert_finetuned_autoisac_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-securebert_finetuned_autoisac_pipeline_en.md new file mode 100644 index 00000000000000..45967a9774ce20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-securebert_finetuned_autoisac_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English securebert_finetuned_autoisac_pipeline pipeline RoBertaEmbeddings from frankharman +author: John Snow Labs +name: securebert_finetuned_autoisac_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`securebert_finetuned_autoisac_pipeline` is a English model originally trained by frankharman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/securebert_finetuned_autoisac_pipeline_en_5.5.0_3.0_1727080892361.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/securebert_finetuned_autoisac_pipeline_en_5.5.0_3.0_1727080892361.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("securebert_finetuned_autoisac_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("securebert_finetuned_autoisac_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|securebert_finetuned_autoisac_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/frankharman/securebert-finetuned-autoisac + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_arabertmo_base_v8_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_arabertmo_base_v8_en.md new file mode 100644 index 00000000000000..520aab40ab8269 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_arabertmo_base_v8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_arabertmo_base_v8 BertSentenceEmbeddings from Ebtihal +author: John Snow Labs +name: sent_arabertmo_base_v8 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_arabertmo_base_v8` is a English model originally trained by Ebtihal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v8_en_5.5.0_3.0_1727091051376.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v8_en_5.5.0_3.0_1727091051376.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_arabertmo_base_v8","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_arabertmo_base_v8","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_arabertmo_base_v8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.9 MB| + +## References + +https://huggingface.co/Ebtihal/AraBertMo_base_V8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_cased_portuguese_lenerbr_alynneoya_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_cased_portuguese_lenerbr_alynneoya_en.md new file mode 100644 index 00000000000000..9a480ffd724920 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_cased_portuguese_lenerbr_alynneoya_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_cased_portuguese_lenerbr_alynneoya BertSentenceEmbeddings from alynneoya +author: John Snow Labs +name: sent_bert_base_cased_portuguese_lenerbr_alynneoya +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_cased_portuguese_lenerbr_alynneoya` is a English model originally trained by alynneoya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_portuguese_lenerbr_alynneoya_en_5.5.0_3.0_1727113885461.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_portuguese_lenerbr_alynneoya_en_5.5.0_3.0_1727113885461.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_cased_portuguese_lenerbr_alynneoya","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_cased_portuguese_lenerbr_alynneoya","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_cased_portuguese_lenerbr_alynneoya| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alynneoya/bert-base-cased-pt-lenerbr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_cased_scmedium_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_cased_scmedium_en.md new file mode 100644 index 00000000000000..adac6a70ce43d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_cased_scmedium_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_cased_scmedium BertSentenceEmbeddings from CambridgeMolecularEngineering +author: John Snow Labs +name: sent_bert_base_cased_scmedium +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_cased_scmedium` is a English model originally trained by CambridgeMolecularEngineering. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_scmedium_en_5.5.0_3.0_1727113684994.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_scmedium_en_5.5.0_3.0_1727113684994.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_cased_scmedium","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_cased_scmedium","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_cased_scmedium| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/CambridgeMolecularEngineering/bert-base-cased-scmedium \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_cased_scmedium_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_cased_scmedium_pipeline_en.md new file mode 100644 index 00000000000000..6405e0a58ceb0c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_cased_scmedium_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_cased_scmedium_pipeline pipeline BertSentenceEmbeddings from CambridgeMolecularEngineering +author: John Snow Labs +name: sent_bert_base_cased_scmedium_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_cased_scmedium_pipeline` is a English model originally trained by CambridgeMolecularEngineering. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_scmedium_pipeline_en_5.5.0_3.0_1727113704205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_scmedium_pipeline_en_5.5.0_3.0_1727113704205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_cased_scmedium_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_cased_scmedium_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_cased_scmedium_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|404.2 MB| + +## References + +https://huggingface.co/CambridgeMolecularEngineering/bert-base-cased-scmedium + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_dutch_cased_finetuned_manx_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_dutch_cased_finetuned_manx_pipeline_en.md new file mode 100644 index 00000000000000..1956b1825e2100 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_dutch_cased_finetuned_manx_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_dutch_cased_finetuned_manx_pipeline pipeline BertSentenceEmbeddings from Pyjay +author: John Snow Labs +name: sent_bert_base_dutch_cased_finetuned_manx_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_dutch_cased_finetuned_manx_pipeline` is a English model originally trained by Pyjay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_dutch_cased_finetuned_manx_pipeline_en_5.5.0_3.0_1727109982428.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_dutch_cased_finetuned_manx_pipeline_en_5.5.0_3.0_1727109982428.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_dutch_cased_finetuned_manx_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_dutch_cased_finetuned_manx_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_dutch_cased_finetuned_manx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.3 MB| + +## References + +https://huggingface.co/Pyjay/bert-base-dutch-cased-finetuned-gv + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_german_cased_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_german_cased_en.md new file mode 100644 index 00000000000000..34b0f53282efb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_german_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_german_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_german_cased +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_german_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_german_cased_en_5.5.0_3.0_1727090987713.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_german_cased_en_5.5.0_3.0_1727090987713.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_german_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_german_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_german_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|421.8 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-de-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_lithuanian_cased_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_lithuanian_cased_en.md new file mode 100644 index 00000000000000..f94c750ec1651f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_lithuanian_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_lithuanian_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_lithuanian_cased +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_lithuanian_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_lithuanian_cased_en_5.5.0_3.0_1727101824083.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_lithuanian_cased_en_5.5.0_3.0_1727101824083.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_lithuanian_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_lithuanian_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_lithuanian_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|412.0 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-lt-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_lithuanian_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_lithuanian_cased_pipeline_en.md new file mode 100644 index 00000000000000..0a1a79502b6ff9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_lithuanian_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_lithuanian_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_lithuanian_cased_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_lithuanian_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_lithuanian_cased_pipeline_en_5.5.0_3.0_1727101842957.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_lithuanian_cased_pipeline_en_5.5.0_3.0_1727101842957.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_lithuanian_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_lithuanian_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_lithuanian_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|412.5 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-lt-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_norwegian_cased_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_norwegian_cased_en.md new file mode 100644 index 00000000000000..0024110692635e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_norwegian_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_norwegian_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_norwegian_cased +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_norwegian_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_norwegian_cased_en_5.5.0_3.0_1727105450905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_norwegian_cased_en_5.5.0_3.0_1727105450905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_norwegian_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_norwegian_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_norwegian_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|415.6 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-no-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_norwegian_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_norwegian_cased_pipeline_en.md new file mode 100644 index 00000000000000..8a0812407812e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_norwegian_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_norwegian_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_norwegian_cased_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_norwegian_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_norwegian_cased_pipeline_en_5.5.0_3.0_1727105470030.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_norwegian_cased_pipeline_en_5.5.0_3.0_1727105470030.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_norwegian_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_norwegian_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_norwegian_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|416.1 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-no-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_spanish_portuguese_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_spanish_portuguese_cased_pipeline_en.md new file mode 100644 index 00000000000000..26cfe1bea9379f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_spanish_portuguese_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_spanish_portuguese_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_spanish_portuguese_cased_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_spanish_portuguese_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_spanish_portuguese_cased_pipeline_en_5.5.0_3.0_1727091120598.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_spanish_portuguese_cased_pipeline_en_5.5.0_3.0_1727091120598.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_spanish_portuguese_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_spanish_portuguese_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_spanish_portuguese_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|428.5 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-es-pt-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_theseus_bulgarian_bg.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_theseus_bulgarian_bg.md new file mode 100644 index 00000000000000..650432e47095f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_theseus_bulgarian_bg.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Bulgarian sent_bert_base_theseus_bulgarian BertSentenceEmbeddings from rmihaylov +author: John Snow Labs +name: sent_bert_base_theseus_bulgarian +date: 2024-09-23 +tags: [bg, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: bg +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_theseus_bulgarian` is a Bulgarian model originally trained by rmihaylov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_theseus_bulgarian_bg_5.5.0_3.0_1727109588426.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_theseus_bulgarian_bg_5.5.0_3.0_1727109588426.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_theseus_bulgarian","bg") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_theseus_bulgarian","bg") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_theseus_bulgarian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|bg| +|Size:|505.4 MB| + +## References + +https://huggingface.co/rmihaylov/bert-base-theseus-bg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_theseus_bulgarian_pipeline_bg.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_theseus_bulgarian_pipeline_bg.md new file mode 100644 index 00000000000000..7eecb91315b7c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_theseus_bulgarian_pipeline_bg.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Bulgarian sent_bert_base_theseus_bulgarian_pipeline pipeline BertSentenceEmbeddings from rmihaylov +author: John Snow Labs +name: sent_bert_base_theseus_bulgarian_pipeline +date: 2024-09-23 +tags: [bg, open_source, pipeline, onnx] +task: Embeddings +language: bg +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_theseus_bulgarian_pipeline` is a Bulgarian model originally trained by rmihaylov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_theseus_bulgarian_pipeline_bg_5.5.0_3.0_1727109613013.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_theseus_bulgarian_pipeline_bg_5.5.0_3.0_1727109613013.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_theseus_bulgarian_pipeline", lang = "bg") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_theseus_bulgarian_pipeline", lang = "bg") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_theseus_bulgarian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bg| +|Size:|506.0 MB| + +## References + +https://huggingface.co/rmihaylov/bert-base-theseus-bg + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_danish_da.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_danish_da.md new file mode 100644 index 00000000000000..f5ad3f44d21eb6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_danish_da.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Danish sent_bert_base_uncased_danish BertSentenceEmbeddings from KennethTM +author: John Snow Labs +name: sent_bert_base_uncased_danish +date: 2024-09-23 +tags: [da, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: da +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_danish` is a Danish model originally trained by KennethTM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_danish_da_5.5.0_3.0_1727090892590.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_danish_da_5.5.0_3.0_1727090892590.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_danish","da") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_danish","da") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_danish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|da| +|Size:|408.0 MB| + +## References + +https://huggingface.co/KennethTM/bert-base-uncased-danish \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_news_1929_1932_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_news_1929_1932_en.md new file mode 100644 index 00000000000000..6c39d5086a6444 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_news_1929_1932_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_news_1929_1932 BertSentenceEmbeddings from sally9805 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_news_1929_1932 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_news_1929_1932` is a English model originally trained by sally9805. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_news_1929_1932_en_5.5.0_3.0_1727109834343.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_news_1929_1932_en_5.5.0_3.0_1727109834343.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_news_1929_1932","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_news_1929_1932","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_news_1929_1932| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sally9805/bert-base-uncased-finetuned-news-1929-1932 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_pipeline_en.md new file mode 100644 index 00000000000000..1d376c78d9b557 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_pipeline pipeline BertSentenceEmbeddings from btamm12 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_pipeline_en_5.5.0_3.0_1727109887680.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_pipeline_en_5.5.0_3.0_1727109887680.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/btamm12/bert-base-uncased-finetuned-wls-manual-2ep-lower + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_bh8648_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_bh8648_en.md new file mode 100644 index 00000000000000..976ba9d03bafeb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_bh8648_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_bh8648 BertSentenceEmbeddings from bh8648 +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_bh8648 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_bh8648` is a English model originally trained by bh8648. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_bh8648_en_5.5.0_3.0_1727104976102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_bh8648_en_5.5.0_3.0_1727104976102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_bh8648","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_bh8648","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_bh8648| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/bh8648/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_seddiktrk_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_seddiktrk_en.md new file mode 100644 index 00000000000000..52c89daafd17db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_seddiktrk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_seddiktrk BertSentenceEmbeddings from seddiktrk +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_seddiktrk +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_seddiktrk` is a English model originally trained by seddiktrk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_seddiktrk_en_5.5.0_3.0_1727105288815.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_seddiktrk_en_5.5.0_3.0_1727105288815.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_seddiktrk","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_seddiktrk","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_seddiktrk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/seddiktrk/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_mnli_sparse_70_unstructured_norwegian_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_mnli_sparse_70_unstructured_norwegian_classifier_pipeline_en.md new file mode 100644 index 00000000000000..b8046eeb124abe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_mnli_sparse_70_unstructured_norwegian_classifier_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_mnli_sparse_70_unstructured_norwegian_classifier_pipeline pipeline BertSentenceEmbeddings from Intel +author: John Snow Labs +name: sent_bert_base_uncased_mnli_sparse_70_unstructured_norwegian_classifier_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_mnli_sparse_70_unstructured_norwegian_classifier_pipeline` is a English model originally trained by Intel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_mnli_sparse_70_unstructured_norwegian_classifier_pipeline_en_5.5.0_3.0_1727105153242.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_mnli_sparse_70_unstructured_norwegian_classifier_pipeline_en_5.5.0_3.0_1727105153242.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_mnli_sparse_70_unstructured_norwegian_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_mnli_sparse_70_unstructured_norwegian_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_mnli_sparse_70_unstructured_norwegian_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|226.5 MB| + +## References + +https://huggingface.co/Intel/bert-base-uncased-mnli-sparse-70-unstructured-no-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_finetuning_test_xiejiafang_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_finetuning_test_xiejiafang_en.md new file mode 100644 index 00000000000000..a125724a156c63 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_finetuning_test_xiejiafang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_finetuning_test_xiejiafang BertSentenceEmbeddings from xiejiafang +author: John Snow Labs +name: sent_bert_finetuning_test_xiejiafang +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_finetuning_test_xiejiafang` is a English model originally trained by xiejiafang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_finetuning_test_xiejiafang_en_5.5.0_3.0_1727114049947.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_finetuning_test_xiejiafang_en_5.5.0_3.0_1727114049947.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_finetuning_test_xiejiafang","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_finetuning_test_xiejiafang","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_finetuning_test_xiejiafang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/xiejiafang/bert_finetuning_test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_finetuning_test_xiejiafang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_finetuning_test_xiejiafang_pipeline_en.md new file mode 100644 index 00000000000000..fba687b1fdcb41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_finetuning_test_xiejiafang_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_finetuning_test_xiejiafang_pipeline pipeline BertSentenceEmbeddings from xiejiafang +author: John Snow Labs +name: sent_bert_finetuning_test_xiejiafang_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_finetuning_test_xiejiafang_pipeline` is a English model originally trained by xiejiafang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_finetuning_test_xiejiafang_pipeline_en_5.5.0_3.0_1727114068670.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_finetuning_test_xiejiafang_pipeline_en_5.5.0_3.0_1727114068670.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_finetuning_test_xiejiafang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_finetuning_test_xiejiafang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_finetuning_test_xiejiafang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/xiejiafang/bert_finetuning_test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_en.md new file mode 100644 index 00000000000000..b6ff98fb448791 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4 BertSentenceEmbeddings from jojoUla +author: John Snow Labs +name: sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4` is a English model originally trained by jojoUla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_en_5.5.0_3.0_1727105145171.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_en_5.5.0_3.0_1727105145171.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/jojoUla/bert-large-cased-sigir-support-refute-no-label-40-2nd-test-LR10-8-fast-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_pipeline_en.md new file mode 100644 index 00000000000000..44caa589ecf81a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_pipeline pipeline BertSentenceEmbeddings from jojoUla +author: John Snow Labs +name: sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_pipeline` is a English model originally trained by jojoUla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_pipeline_en_5.5.0_3.0_1727105203785.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_pipeline_en_5.5.0_3.0_1727105203785.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/jojoUla/bert-large-cased-sigir-support-refute-no-label-40-2nd-test-LR10-8-fast-4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_en.md new file mode 100644 index 00000000000000..794ca55f74d47f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8 BertSentenceEmbeddings from jojoUla +author: John Snow Labs +name: sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8` is a English model originally trained by jojoUla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_en_5.5.0_3.0_1727113756633.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8_en_5.5.0_3.0_1727113756633.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_fast_8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/jojoUla/bert-large-cased-sigir-support-refute-no-label-40-2nd-test-LR10-8-fast-8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_ltrc_telugu_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_ltrc_telugu_en.md new file mode 100644 index 00000000000000..b88113bb98fa48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_ltrc_telugu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_ltrc_telugu BertSentenceEmbeddings from ltrctelugu +author: John Snow Labs +name: sent_bert_ltrc_telugu +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_ltrc_telugu` is a English model originally trained by ltrctelugu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_ltrc_telugu_en_5.5.0_3.0_1727109713843.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_ltrc_telugu_en_5.5.0_3.0_1727109713843.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_ltrc_telugu","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_ltrc_telugu","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_ltrc_telugu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|403.4 MB| + +## References + +https://huggingface.co/ltrctelugu/bert_ltrc_telugu \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_ltrc_telugu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_ltrc_telugu_pipeline_en.md new file mode 100644 index 00000000000000..2b231e63a03d89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_ltrc_telugu_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_ltrc_telugu_pipeline pipeline BertSentenceEmbeddings from ltrctelugu +author: John Snow Labs +name: sent_bert_ltrc_telugu_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_ltrc_telugu_pipeline` is a English model originally trained by ltrctelugu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_ltrc_telugu_pipeline_en_5.5.0_3.0_1727109732931.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_ltrc_telugu_pipeline_en_5.5.0_3.0_1727109732931.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_ltrc_telugu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_ltrc_telugu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_ltrc_telugu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.9 MB| + +## References + +https://huggingface.co/ltrctelugu/bert_ltrc_telugu + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_persian_farsi_base_uncased_nlp_course_hw2_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_persian_farsi_base_uncased_nlp_course_hw2_en.md new file mode 100644 index 00000000000000..676a38de19e491 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_persian_farsi_base_uncased_nlp_course_hw2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_persian_farsi_base_uncased_nlp_course_hw2 BertSentenceEmbeddings from iMahdiGhazavi +author: John Snow Labs +name: sent_bert_persian_farsi_base_uncased_nlp_course_hw2 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_persian_farsi_base_uncased_nlp_course_hw2` is a English model originally trained by iMahdiGhazavi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_persian_farsi_base_uncased_nlp_course_hw2_en_5.5.0_3.0_1727102030567.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_persian_farsi_base_uncased_nlp_course_hw2_en_5.5.0_3.0_1727102030567.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_persian_farsi_base_uncased_nlp_course_hw2","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_persian_farsi_base_uncased_nlp_course_hw2","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_persian_farsi_base_uncased_nlp_course_hw2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|605.8 MB| + +## References + +https://huggingface.co/iMahdiGhazavi/bert-fa-base-uncased-nlp-course-hw2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline_en.md new file mode 100644 index 00000000000000..7744f199de7266 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline pipeline BertSentenceEmbeddings from iMahdiGhazavi +author: John Snow Labs +name: sent_bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline` is a English model originally trained by iMahdiGhazavi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline_en_5.5.0_3.0_1727102060003.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline_en_5.5.0_3.0_1727102060003.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_persian_farsi_base_uncased_nlp_course_hw2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|606.3 MB| + +## References + +https://huggingface.co/iMahdiGhazavi/bert-fa-base-uncased-nlp-course-hw2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_political_election2020_twitter_mlm_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_political_election2020_twitter_mlm_en.md new file mode 100644 index 00000000000000..fe20d5d7b03663 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_political_election2020_twitter_mlm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_political_election2020_twitter_mlm BertSentenceEmbeddings from kornosk +author: John Snow Labs +name: sent_bert_political_election2020_twitter_mlm +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_political_election2020_twitter_mlm` is a English model originally trained by kornosk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_political_election2020_twitter_mlm_en_5.5.0_3.0_1727113638727.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_political_election2020_twitter_mlm_en_5.5.0_3.0_1727113638727.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_political_election2020_twitter_mlm","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_political_election2020_twitter_mlm","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_political_election2020_twitter_mlm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.6 MB| + +## References + +https://huggingface.co/kornosk/bert-political-election2020-twitter-mlm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_political_election2020_twitter_mlm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_political_election2020_twitter_mlm_pipeline_en.md new file mode 100644 index 00000000000000..8377db46395c01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_political_election2020_twitter_mlm_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_political_election2020_twitter_mlm_pipeline pipeline BertSentenceEmbeddings from kornosk +author: John Snow Labs +name: sent_bert_political_election2020_twitter_mlm_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_political_election2020_twitter_mlm_pipeline` is a English model originally trained by kornosk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_political_election2020_twitter_mlm_pipeline_en_5.5.0_3.0_1727113658162.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_political_election2020_twitter_mlm_pipeline_en_5.5.0_3.0_1727113658162.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_political_election2020_twitter_mlm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_political_election2020_twitter_mlm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_political_election2020_twitter_mlm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.1 MB| + +## References + +https://huggingface.co/kornosk/bert-political-election2020-twitter-mlm + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_pretrained_wikitext_2_raw_v1_dimpo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_pretrained_wikitext_2_raw_v1_dimpo_pipeline_en.md new file mode 100644 index 00000000000000..bd2513f90b88b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_pretrained_wikitext_2_raw_v1_dimpo_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_pretrained_wikitext_2_raw_v1_dimpo_pipeline pipeline BertSentenceEmbeddings from dimpo +author: John Snow Labs +name: sent_bert_pretrained_wikitext_2_raw_v1_dimpo_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_pretrained_wikitext_2_raw_v1_dimpo_pipeline` is a English model originally trained by dimpo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_pretrained_wikitext_2_raw_v1_dimpo_pipeline_en_5.5.0_3.0_1727110082331.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_pretrained_wikitext_2_raw_v1_dimpo_pipeline_en_5.5.0_3.0_1727110082331.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_pretrained_wikitext_2_raw_v1_dimpo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_pretrained_wikitext_2_raw_v1_dimpo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_pretrained_wikitext_2_raw_v1_dimpo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.4 MB| + +## References + +https://huggingface.co/dimpo/bert-pretrained-wikitext-2-raw-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_tiny_finetuned_legal_definitions_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_tiny_finetuned_legal_definitions_en.md new file mode 100644 index 00000000000000..477041cd34c380 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_tiny_finetuned_legal_definitions_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_tiny_finetuned_legal_definitions BertSentenceEmbeddings from muhtasham +author: John Snow Labs +name: sent_bert_tiny_finetuned_legal_definitions +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_tiny_finetuned_legal_definitions` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_tiny_finetuned_legal_definitions_en_5.5.0_3.0_1727113417295.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_tiny_finetuned_legal_definitions_en_5.5.0_3.0_1727113417295.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_tiny_finetuned_legal_definitions","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_tiny_finetuned_legal_definitions","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_tiny_finetuned_legal_definitions| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|16.7 MB| + +## References + +https://huggingface.co/muhtasham/bert-tiny-finetuned-legal-definitions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bertbased_hatespeech_pretrain_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bertbased_hatespeech_pretrain_en.md new file mode 100644 index 00000000000000..28e8a3e74d021b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bertbased_hatespeech_pretrain_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bertbased_hatespeech_pretrain BertSentenceEmbeddings from agvidit1 +author: John Snow Labs +name: sent_bertbased_hatespeech_pretrain +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertbased_hatespeech_pretrain` is a English model originally trained by agvidit1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertbased_hatespeech_pretrain_en_5.5.0_3.0_1727090966164.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertbased_hatespeech_pretrain_en_5.5.0_3.0_1727090966164.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bertbased_hatespeech_pretrain","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bertbased_hatespeech_pretrain","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertbased_hatespeech_pretrain| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/agvidit1/BertBased_HateSpeech_pretrain \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bertbased_hatespeech_pretrain_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bertbased_hatespeech_pretrain_pipeline_en.md new file mode 100644 index 00000000000000..86d566c40c8a5f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bertbased_hatespeech_pretrain_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bertbased_hatespeech_pretrain_pipeline pipeline BertSentenceEmbeddings from agvidit1 +author: John Snow Labs +name: sent_bertbased_hatespeech_pretrain_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bertbased_hatespeech_pretrain_pipeline` is a English model originally trained by agvidit1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bertbased_hatespeech_pretrain_pipeline_en_5.5.0_3.0_1727090985489.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bertbased_hatespeech_pretrain_pipeline_en_5.5.0_3.0_1727090985489.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bertbased_hatespeech_pretrain_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bertbased_hatespeech_pretrain_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bertbased_hatespeech_pretrain_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/agvidit1/BertBased_HateSpeech_pretrain + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_clinical_pubmed_bert_base_128_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_clinical_pubmed_bert_base_128_pipeline_en.md new file mode 100644 index 00000000000000..31f2ba2b301d16 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_clinical_pubmed_bert_base_128_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_clinical_pubmed_bert_base_128_pipeline pipeline BertSentenceEmbeddings from Tsubasaz +author: John Snow Labs +name: sent_clinical_pubmed_bert_base_128_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_clinical_pubmed_bert_base_128_pipeline` is a English model originally trained by Tsubasaz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_clinical_pubmed_bert_base_128_pipeline_en_5.5.0_3.0_1727101771083.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_clinical_pubmed_bert_base_128_pipeline_en_5.5.0_3.0_1727101771083.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_clinical_pubmed_bert_base_128_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_clinical_pubmed_bert_base_128_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_clinical_pubmed_bert_base_128_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.6 MB| + +## References + +https://huggingface.co/Tsubasaz/clinical-pubmed-bert-base-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_dajobbert_base_uncased_pipeline_da.md b/docs/_posts/ahmedlone127/2024-09-23-sent_dajobbert_base_uncased_pipeline_da.md new file mode 100644 index 00000000000000..0941fb4657f0ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_dajobbert_base_uncased_pipeline_da.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Danish sent_dajobbert_base_uncased_pipeline pipeline BertSentenceEmbeddings from jjzha +author: John Snow Labs +name: sent_dajobbert_base_uncased_pipeline +date: 2024-09-23 +tags: [da, open_source, pipeline, onnx] +task: Embeddings +language: da +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_dajobbert_base_uncased_pipeline` is a Danish model originally trained by jjzha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_dajobbert_base_uncased_pipeline_da_5.5.0_3.0_1727109779805.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_dajobbert_base_uncased_pipeline_da_5.5.0_3.0_1727109779805.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_dajobbert_base_uncased_pipeline", lang = "da") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_dajobbert_base_uncased_pipeline", lang = "da") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_dajobbert_base_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|da| +|Size:|411.9 MB| + +## References + +https://huggingface.co/jjzha/dajobbert-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_defsent_bert_base_uncased_max_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_defsent_bert_base_uncased_max_pipeline_en.md new file mode 100644 index 00000000000000..03b8f0c42c1d0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_defsent_bert_base_uncased_max_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_defsent_bert_base_uncased_max_pipeline pipeline BertSentenceEmbeddings from cl-nagoya +author: John Snow Labs +name: sent_defsent_bert_base_uncased_max_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_defsent_bert_base_uncased_max_pipeline` is a English model originally trained by cl-nagoya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_defsent_bert_base_uncased_max_pipeline_en_5.5.0_3.0_1727101710915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_defsent_bert_base_uncased_max_pipeline_en_5.5.0_3.0_1727101710915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_defsent_bert_base_uncased_max_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_defsent_bert_base_uncased_max_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_defsent_bert_base_uncased_max_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/cl-nagoya/defsent-bert-base-uncased-max + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_german_english_code_switching_bert_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_german_english_code_switching_bert_en.md new file mode 100644 index 00000000000000..c5c12bf9677949 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_german_english_code_switching_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_german_english_code_switching_bert BertSentenceEmbeddings from igorsterner +author: John Snow Labs +name: sent_german_english_code_switching_bert +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_german_english_code_switching_bert` is a English model originally trained by igorsterner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_german_english_code_switching_bert_en_5.5.0_3.0_1727109972427.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_german_english_code_switching_bert_en_5.5.0_3.0_1727109972427.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_german_english_code_switching_bert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_german_english_code_switching_bert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_german_english_code_switching_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|664.7 MB| + +## References + +https://huggingface.co/igorsterner/german-english-code-switching-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_german_english_code_switching_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_german_english_code_switching_bert_pipeline_en.md new file mode 100644 index 00000000000000..9fb37c3a37cb95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_german_english_code_switching_bert_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_german_english_code_switching_bert_pipeline pipeline BertSentenceEmbeddings from igorsterner +author: John Snow Labs +name: sent_german_english_code_switching_bert_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_german_english_code_switching_bert_pipeline` is a English model originally trained by igorsterner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_german_english_code_switching_bert_pipeline_en_5.5.0_3.0_1727110003933.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_german_english_code_switching_bert_pipeline_en_5.5.0_3.0_1727110003933.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_german_english_code_switching_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_german_english_code_switching_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_german_english_code_switching_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|665.3 MB| + +## References + +https://huggingface.co/igorsterner/german-english-code-switching-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_hindi_bert_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_hindi_bert_en.md new file mode 100644 index 00000000000000..93b32d00058ea7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_hindi_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_hindi_bert BertSentenceEmbeddings from sukritin +author: John Snow Labs +name: sent_hindi_bert +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_hindi_bert` is a English model originally trained by sukritin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_hindi_bert_en_5.5.0_3.0_1727110200426.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_hindi_bert_en_5.5.0_3.0_1727110200426.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_hindi_bert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_hindi_bert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_hindi_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|609.3 MB| + +## References + +https://huggingface.co/sukritin/hindi-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_roboust_nlp_xlmr_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_roboust_nlp_xlmr_en.md new file mode 100644 index 00000000000000..03fd992aef5b88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_roboust_nlp_xlmr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_roboust_nlp_xlmr XlmRoBertaSentenceEmbeddings from Blue7Bird +author: John Snow Labs +name: sent_roboust_nlp_xlmr +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, xlm_roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_roboust_nlp_xlmr` is a English model originally trained by Blue7Bird. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_roboust_nlp_xlmr_en_5.5.0_3.0_1727062754592.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_roboust_nlp_xlmr_en_5.5.0_3.0_1727062754592.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_roboust_nlp_xlmr","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_roboust_nlp_xlmr","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_roboust_nlp_xlmr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Blue7Bird/Roboust_nlp_xlmr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_small_mlm_rotten_tomatoes_custom_tokenizer_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_small_mlm_rotten_tomatoes_custom_tokenizer_en.md new file mode 100644 index 00000000000000..e134effc07c094 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_small_mlm_rotten_tomatoes_custom_tokenizer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_small_mlm_rotten_tomatoes_custom_tokenizer BertSentenceEmbeddings from muhtasham +author: John Snow Labs +name: sent_small_mlm_rotten_tomatoes_custom_tokenizer +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_small_mlm_rotten_tomatoes_custom_tokenizer` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_small_mlm_rotten_tomatoes_custom_tokenizer_en_5.5.0_3.0_1727109900453.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_small_mlm_rotten_tomatoes_custom_tokenizer_en_5.5.0_3.0_1727109900453.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_small_mlm_rotten_tomatoes_custom_tokenizer","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_small_mlm_rotten_tomatoes_custom_tokenizer","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_small_mlm_rotten_tomatoes_custom_tokenizer| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|106.9 MB| + +## References + +https://huggingface.co/muhtasham/small-mlm-rotten_tomatoes-custom-tokenizer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_tiny_mlm_glue_mrpc_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_tiny_mlm_glue_mrpc_en.md new file mode 100644 index 00000000000000..4ada972f6b36e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_tiny_mlm_glue_mrpc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_tiny_mlm_glue_mrpc BertSentenceEmbeddings from muhtasham +author: John Snow Labs +name: sent_tiny_mlm_glue_mrpc +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_tiny_mlm_glue_mrpc` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_tiny_mlm_glue_mrpc_en_5.5.0_3.0_1727105587312.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_tiny_mlm_glue_mrpc_en_5.5.0_3.0_1727105587312.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_tiny_mlm_glue_mrpc","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_tiny_mlm_glue_mrpc","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_tiny_mlm_glue_mrpc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|16.7 MB| + +## References + +https://huggingface.co/muhtasham/tiny-mlm-glue-mrpc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_twitch_bert_base_cased_pytorch_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_twitch_bert_base_cased_pytorch_pipeline_en.md new file mode 100644 index 00000000000000..b89a25e39b2199 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_twitch_bert_base_cased_pytorch_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_twitch_bert_base_cased_pytorch_pipeline pipeline BertSentenceEmbeddings from veb +author: John Snow Labs +name: sent_twitch_bert_base_cased_pytorch_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_twitch_bert_base_cased_pytorch_pipeline` is a English model originally trained by veb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_twitch_bert_base_cased_pytorch_pipeline_en_5.5.0_3.0_1727113993094.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_twitch_bert_base_cased_pytorch_pipeline_en_5.5.0_3.0_1727113993094.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_twitch_bert_base_cased_pytorch_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_twitch_bert_base_cased_pytorch_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_twitch_bert_base_cased_pytorch_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/veb/twitch-bert-base-cased-pytorch + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_viz_wiz_bert_base_uncased_f16_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_viz_wiz_bert_base_uncased_f16_en.md new file mode 100644 index 00000000000000..fda2175d5086cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_viz_wiz_bert_base_uncased_f16_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_viz_wiz_bert_base_uncased_f16 BertSentenceEmbeddings from eisenjulian +author: John Snow Labs +name: sent_viz_wiz_bert_base_uncased_f16 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_viz_wiz_bert_base_uncased_f16` is a English model originally trained by eisenjulian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_viz_wiz_bert_base_uncased_f16_en_5.5.0_3.0_1727091386839.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_viz_wiz_bert_base_uncased_f16_en_5.5.0_3.0_1727091386839.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_viz_wiz_bert_base_uncased_f16","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_viz_wiz_bert_base_uncased_f16","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_viz_wiz_bert_base_uncased_f16| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/eisenjulian/viz-wiz-bert-base-uncased_f16 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_viz_wiz_bert_base_uncased_f16_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_viz_wiz_bert_base_uncased_f16_pipeline_en.md new file mode 100644 index 00000000000000..83038442a9601c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_viz_wiz_bert_base_uncased_f16_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_viz_wiz_bert_base_uncased_f16_pipeline pipeline BertSentenceEmbeddings from eisenjulian +author: John Snow Labs +name: sent_viz_wiz_bert_base_uncased_f16_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_viz_wiz_bert_base_uncased_f16_pipeline` is a English model originally trained by eisenjulian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_viz_wiz_bert_base_uncased_f16_pipeline_en_5.5.0_3.0_1727091406148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_viz_wiz_bert_base_uncased_f16_pipeline_en_5.5.0_3.0_1727091406148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_viz_wiz_bert_base_uncased_f16_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_viz_wiz_bert_base_uncased_f16_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_viz_wiz_bert_base_uncased_f16_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/eisenjulian/viz-wiz-bert-base-uncased_f16 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_en.md new file mode 100644 index 00000000000000..093750a3cdc044 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_model RoBertaForSequenceClassification from fusersam +author: John Snow Labs +name: sentiment_analysis_model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_model` is a English model originally trained by fusersam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_en_5.5.0_3.0_1727054807998.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_en_5.5.0_3.0_1727054807998.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_analysis_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_analysis_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|436.3 MB| + +## References + +https://huggingface.co/fusersam/Sentiment-Analysis-Model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_kiranwood_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_kiranwood_en.md new file mode 100644 index 00000000000000..219fdb1a8436fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_kiranwood_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_model_kiranwood DistilBertForSequenceClassification from KiranWood +author: John Snow Labs +name: sentiment_analysis_model_kiranwood +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_model_kiranwood` is a English model originally trained by KiranWood. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_kiranwood_en_5.5.0_3.0_1727073619345.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_kiranwood_en_5.5.0_3.0_1727073619345.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_model_kiranwood","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_model_kiranwood", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_model_kiranwood| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KiranWood/sentiment-analysis-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_kiranwood_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_kiranwood_pipeline_en.md new file mode 100644 index 00000000000000..af3588a4d24765 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_kiranwood_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_model_kiranwood_pipeline pipeline DistilBertForSequenceClassification from KiranWood +author: John Snow Labs +name: sentiment_analysis_model_kiranwood_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_model_kiranwood_pipeline` is a English model originally trained by KiranWood. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_kiranwood_pipeline_en_5.5.0_3.0_1727073631451.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_kiranwood_pipeline_en_5.5.0_3.0_1727073631451.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_model_kiranwood_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_model_kiranwood_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_model_kiranwood_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KiranWood/sentiment-analysis-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_pipeline_en.md new file mode 100644 index 00000000000000..6ee479742695e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_model_pipeline pipeline RoBertaForSequenceClassification from fusersam +author: John Snow Labs +name: sentiment_analysis_model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_model_pipeline` is a English model originally trained by fusersam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_pipeline_en_5.5.0_3.0_1727054841069.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_pipeline_en_5.5.0_3.0_1727054841069.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|436.3 MB| + +## References + +https://huggingface.co/fusersam/Sentiment-Analysis-Model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiment_roberta_latest_e8_b16_data2_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiment_roberta_latest_e8_b16_data2_en.md new file mode 100644 index 00000000000000..b06338172cfc45 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiment_roberta_latest_e8_b16_data2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_roberta_latest_e8_b16_data2 RoBertaForSequenceClassification from JerryYanJiang +author: John Snow Labs +name: sentiment_roberta_latest_e8_b16_data2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_roberta_latest_e8_b16_data2` is a English model originally trained by JerryYanJiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_roberta_latest_e8_b16_data2_en_5.5.0_3.0_1727054731811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_roberta_latest_e8_b16_data2_en_5.5.0_3.0_1727054731811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_roberta_latest_e8_b16_data2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_roberta_latest_e8_b16_data2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_roberta_latest_e8_b16_data2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/JerryYanJiang/sentiment-roberta-latest-e8-b16-data2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiment_roberta_latest_e8_b16_data2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiment_roberta_latest_e8_b16_data2_pipeline_en.md new file mode 100644 index 00000000000000..baf2ef047bdc10 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiment_roberta_latest_e8_b16_data2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_roberta_latest_e8_b16_data2_pipeline pipeline RoBertaForSequenceClassification from JerryYanJiang +author: John Snow Labs +name: sentiment_roberta_latest_e8_b16_data2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_roberta_latest_e8_b16_data2_pipeline` is a English model originally trained by JerryYanJiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_roberta_latest_e8_b16_data2_pipeline_en_5.5.0_3.0_1727054758471.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_roberta_latest_e8_b16_data2_pipeline_en_5.5.0_3.0_1727054758471.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_roberta_latest_e8_b16_data2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_roberta_latest_e8_b16_data2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_roberta_latest_e8_b16_data2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/JerryYanJiang/sentiment-roberta-latest-e8-b16-data2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiment_sentiment_small_random3_seed0_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiment_sentiment_small_random3_seed0_bernice_en.md new file mode 100644 index 00000000000000..f26658fe5039e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiment_sentiment_small_random3_seed0_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_sentiment_small_random3_seed0_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random3_seed0_bernice +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random3_seed0_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_bernice_en_5.5.0_3.0_1727100070152.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_bernice_en_5.5.0_3.0_1727100070152.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random3_seed0_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random3_seed0_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random3_seed0_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|790.3 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random3_seed0-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiment_sentiment_small_random3_seed0_bernice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiment_sentiment_small_random3_seed0_bernice_pipeline_en.md new file mode 100644 index 00000000000000..ed1c5539ee25ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiment_sentiment_small_random3_seed0_bernice_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_sentiment_small_random3_seed0_bernice_pipeline pipeline XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random3_seed0_bernice_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random3_seed0_bernice_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_bernice_pipeline_en_5.5.0_3.0_1727100208066.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_bernice_pipeline_en_5.5.0_3.0_1727100208066.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_sentiment_small_random3_seed0_bernice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_sentiment_small_random3_seed0_bernice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random3_seed0_bernice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|790.3 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random3_seed0-bernice + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_en.md new file mode 100644 index 00000000000000..bf31d3594cb2cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1727054732920.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1727054732920.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random3_seed0-twitter-roberta-base-2022-154m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_pipeline_en.md new file mode 100644 index 00000000000000..bf3dbe70fe212d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1727054758620.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1727054758620.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_sentiment_small_random3_seed0_twitter_roberta_base_2022_154m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/tweettemposhift/sentiment-sentiment_small_random3_seed0-twitter-roberta-base-2022-154m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentimentanalysis_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentimentanalysis_imdb_en.md new file mode 100644 index 00000000000000..9ccb25733a2d7a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentimentanalysis_imdb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentimentanalysis_imdb DistilBertForSequenceClassification from johnchangbviwit +author: John Snow Labs +name: sentimentanalysis_imdb +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentimentanalysis_imdb` is a English model originally trained by johnchangbviwit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentimentanalysis_imdb_en_5.5.0_3.0_1727059885759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentimentanalysis_imdb_en_5.5.0_3.0_1727059885759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentimentanalysis_imdb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentimentanalysis_imdb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentimentanalysis_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/johnchangbviwit/sentimentanalysis-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-slovakbert_upos_en.md b/docs/_posts/ahmedlone127/2024-09-23-slovakbert_upos_en.md new file mode 100644 index 00000000000000..937bf0339e2dab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-slovakbert_upos_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English slovakbert_upos RoBertaForTokenClassification from crabz +author: John Snow Labs +name: slovakbert_upos +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`slovakbert_upos` is a English model originally trained by crabz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/slovakbert_upos_en_5.5.0_3.0_1727072905424.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/slovakbert_upos_en_5.5.0_3.0_1727072905424.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("slovakbert_upos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("slovakbert_upos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|slovakbert_upos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|434.4 MB| + +## References + +https://huggingface.co/crabz/slovakbert-upos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-slovakbert_upos_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-slovakbert_upos_pipeline_en.md new file mode 100644 index 00000000000000..314e307e74efec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-slovakbert_upos_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English slovakbert_upos_pipeline pipeline RoBertaForTokenClassification from crabz +author: John Snow Labs +name: slovakbert_upos_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`slovakbert_upos_pipeline` is a English model originally trained by crabz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/slovakbert_upos_pipeline_en_5.5.0_3.0_1727072937547.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/slovakbert_upos_pipeline_en_5.5.0_3.0_1727072937547.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("slovakbert_upos_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("slovakbert_upos_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|slovakbert_upos_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|434.4 MB| + +## References + +https://huggingface.co/crabz/slovakbert-upos + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-small_whisper_en.md b/docs/_posts/ahmedlone127/2024-09-23-small_whisper_en.md new file mode 100644 index 00000000000000..fda026bdf69b6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-small_whisper_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English small_whisper WhisperForCTC from sanjitaa +author: John Snow Labs +name: small_whisper +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`small_whisper` is a English model originally trained by sanjitaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/small_whisper_en_5.5.0_3.0_1727051286565.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/small_whisper_en_5.5.0_3.0_1727051286565.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("small_whisper","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("small_whisper", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|small_whisper| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/sanjitaa/small-whisper \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-small_whisper_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-small_whisper_pipeline_en.md new file mode 100644 index 00000000000000..095de513ea109a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-small_whisper_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English small_whisper_pipeline pipeline WhisperForCTC from sanjitaa +author: John Snow Labs +name: small_whisper_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`small_whisper_pipeline` is a English model originally trained by sanjitaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/small_whisper_pipeline_en_5.5.0_3.0_1727051373275.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/small_whisper_pipeline_en_5.5.0_3.0_1727051373275.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("small_whisper_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("small_whisper_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|small_whisper_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/sanjitaa/small-whisper + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sst2_padding90model_en.md b/docs/_posts/ahmedlone127/2024-09-23-sst2_padding90model_en.md new file mode 100644 index 00000000000000..c55cdb41e96f07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sst2_padding90model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sst2_padding90model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: sst2_padding90model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sst2_padding90model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sst2_padding90model_en_5.5.0_3.0_1727082112010.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sst2_padding90model_en_5.5.0_3.0_1727082112010.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sst2_padding90model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sst2_padding90model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sst2_padding90model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/sst2_padding90model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sst5_padding20model_en.md b/docs/_posts/ahmedlone127/2024-09-23-sst5_padding20model_en.md new file mode 100644 index 00000000000000..aebb31cf4b9f56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sst5_padding20model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sst5_padding20model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: sst5_padding20model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sst5_padding20model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sst5_padding20model_en_5.5.0_3.0_1727059549103.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sst5_padding20model_en_5.5.0_3.0_1727059549103.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sst5_padding20model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sst5_padding20model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sst5_padding20model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/sst5_padding20model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sst5_padding20model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sst5_padding20model_pipeline_en.md new file mode 100644 index 00000000000000..d58bb089b16ea9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sst5_padding20model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sst5_padding20model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: sst5_padding20model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sst5_padding20model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sst5_padding20model_pipeline_en_5.5.0_3.0_1727059561499.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sst5_padding20model_pipeline_en_5.5.0_3.0_1727059561499.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sst5_padding20model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sst5_padding20model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sst5_padding20model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/sst5_padding20model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_10_2024_07_26_12_23_45_en.md b/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_10_2024_07_26_12_23_45_en.md new file mode 100644 index 00000000000000..cf7c174dd480e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_10_2024_07_26_12_23_45_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_10_2024_07_26_12_23_45 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_10_2024_07_26_12_23_45 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_10_2024_07_26_12_23_45` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_10_2024_07_26_12_23_45_en_5.5.0_3.0_1727082506338.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_10_2024_07_26_12_23_45_en_5.5.0_3.0_1727082506338.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_10_2024_07_26_12_23_45","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_10_2024_07_26_12_23_45", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_10_2024_07_26_12_23_45| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-10-2024-07-26_12-23-45 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_en.md b/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_en.md new file mode 100644 index 00000000000000..81ec2205d84943 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_en_5.5.0_3.0_1727059122330.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45_en_5.5.0_3.0_1727059122330.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_20_2024_07_26_12_23_45| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-20-2024-07-26_12-23-45 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-suicide_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-suicide_bert_pipeline_en.md new file mode 100644 index 00000000000000..e0dedc9b143df3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-suicide_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English suicide_bert_pipeline pipeline RoBertaForSequenceClassification from vishalp23 +author: John Snow Labs +name: suicide_bert_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`suicide_bert_pipeline` is a English model originally trained by vishalp23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/suicide_bert_pipeline_en_5.5.0_3.0_1727085396182.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/suicide_bert_pipeline_en_5.5.0_3.0_1727085396182.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("suicide_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("suicide_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|suicide_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|467.5 MB| + +## References + +https://huggingface.co/vishalp23/suicide-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-suicide_distilbert_6_5_en.md b/docs/_posts/ahmedlone127/2024-09-23-suicide_distilbert_6_5_en.md new file mode 100644 index 00000000000000..a7e10d308d2441 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-suicide_distilbert_6_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English suicide_distilbert_6_5 DistilBertForSequenceClassification from cuadron11 +author: John Snow Labs +name: suicide_distilbert_6_5 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`suicide_distilbert_6_5` is a English model originally trained by cuadron11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/suicide_distilbert_6_5_en_5.5.0_3.0_1727073775043.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/suicide_distilbert_6_5_en_5.5.0_3.0_1727073775043.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("suicide_distilbert_6_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("suicide_distilbert_6_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|suicide_distilbert_6_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cuadron11/suicide-distilbert-6-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sungbeom_whisper_small_korean_set31_ko.md b/docs/_posts/ahmedlone127/2024-09-23-sungbeom_whisper_small_korean_set31_ko.md new file mode 100644 index 00000000000000..55678ddb124889 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sungbeom_whisper_small_korean_set31_ko.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Korean sungbeom_whisper_small_korean_set31 WhisperForCTC from maxseats +author: John Snow Labs +name: sungbeom_whisper_small_korean_set31 +date: 2024-09-23 +tags: [ko, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sungbeom_whisper_small_korean_set31` is a Korean model originally trained by maxseats. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sungbeom_whisper_small_korean_set31_ko_5.5.0_3.0_1727116366905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sungbeom_whisper_small_korean_set31_ko_5.5.0_3.0_1727116366905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("sungbeom_whisper_small_korean_set31","ko") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("sungbeom_whisper_small_korean_set31", "ko") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sungbeom_whisper_small_korean_set31| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ko| +|Size:|1.7 GB| + +## References + +https://huggingface.co/maxseats/SungBeom-whisper-small-ko-set31 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-supernatural_distilbert_prod_en.md b/docs/_posts/ahmedlone127/2024-09-23-supernatural_distilbert_prod_en.md new file mode 100644 index 00000000000000..47035c67e9faef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-supernatural_distilbert_prod_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English supernatural_distilbert_prod DistilBertForSequenceClassification from banhabang +author: John Snow Labs +name: supernatural_distilbert_prod +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`supernatural_distilbert_prod` is a English model originally trained by banhabang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/supernatural_distilbert_prod_en_5.5.0_3.0_1727073931444.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/supernatural_distilbert_prod_en_5.5.0_3.0_1727073931444.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("supernatural_distilbert_prod","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("supernatural_distilbert_prod", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|supernatural_distilbert_prod| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/banhabang/Supernatural-distilbert-Prod \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-supernatural_distilbert_prod_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-supernatural_distilbert_prod_pipeline_en.md new file mode 100644 index 00000000000000..6ecbe82fa60a0a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-supernatural_distilbert_prod_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English supernatural_distilbert_prod_pipeline pipeline DistilBertForSequenceClassification from banhabang +author: John Snow Labs +name: supernatural_distilbert_prod_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`supernatural_distilbert_prod_pipeline` is a English model originally trained by banhabang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/supernatural_distilbert_prod_pipeline_en_5.5.0_3.0_1727073944110.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/supernatural_distilbert_prod_pipeline_en_5.5.0_3.0_1727073944110.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("supernatural_distilbert_prod_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("supernatural_distilbert_prod_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|supernatural_distilbert_prod_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/banhabang/Supernatural-distilbert-Prod + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-swati_model_en.md b/docs/_posts/ahmedlone127/2024-09-23-swati_model_en.md new file mode 100644 index 00000000000000..1808a98eefb7ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-swati_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English swati_model DistilBertForSequenceClassification from anth0nyhak1m +author: John Snow Labs +name: swati_model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`swati_model` is a English model originally trained by anth0nyhak1m. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/swati_model_en_5.5.0_3.0_1727082168595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/swati_model_en_5.5.0_3.0_1727082168595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("swati_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("swati_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|swati_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/anth0nyhak1m/SS_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-swati_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-swati_model_pipeline_en.md new file mode 100644 index 00000000000000..2890991bad8153 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-swati_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English swati_model_pipeline pipeline DistilBertForSequenceClassification from anth0nyhak1m +author: John Snow Labs +name: swati_model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`swati_model_pipeline` is a English model originally trained by anth0nyhak1m. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/swati_model_pipeline_en_5.5.0_3.0_1727082180681.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/swati_model_pipeline_en_5.5.0_3.0_1727082180681.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("swati_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("swati_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|swati_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/anth0nyhak1m/SS_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-tamilroberto_en.md b/docs/_posts/ahmedlone127/2024-09-23-tamilroberto_en.md new file mode 100644 index 00000000000000..140eee095a1128 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-tamilroberto_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tamilroberto RoBertaEmbeddings from apkbala107 +author: John Snow Labs +name: tamilroberto +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tamilroberto` is a English model originally trained by apkbala107. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tamilroberto_en_5.5.0_3.0_1727056896281.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tamilroberto_en_5.5.0_3.0_1727056896281.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("tamilroberto","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("tamilroberto","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tamilroberto| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|310.1 MB| + +## References + +https://huggingface.co/apkbala107/tamilroberto \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-test1_sss2000_en.md b/docs/_posts/ahmedlone127/2024-09-23-test1_sss2000_en.md new file mode 100644 index 00000000000000..58810dfa657338 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-test1_sss2000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test1_sss2000 DistilBertForSequenceClassification from sss2000 +author: John Snow Labs +name: test1_sss2000 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test1_sss2000` is a English model originally trained by sss2000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test1_sss2000_en_5.5.0_3.0_1727059348426.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test1_sss2000_en_5.5.0_3.0_1727059348426.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("test1_sss2000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("test1_sss2000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test1_sss2000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sss2000/test1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-test_model_peace5544_en.md b/docs/_posts/ahmedlone127/2024-09-23-test_model_peace5544_en.md new file mode 100644 index 00000000000000..d60f8d2659afd1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-test_model_peace5544_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_model_peace5544 DistilBertForSequenceClassification from Peace5544 +author: John Snow Labs +name: test_model_peace5544 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_model_peace5544` is a English model originally trained by Peace5544. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_model_peace5544_en_5.5.0_3.0_1727110455138.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_model_peace5544_en_5.5.0_3.0_1727110455138.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_model_peace5544","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("test_model_peace5544", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_model_peace5544| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.4 MB| + +## References + +https://huggingface.co/Peace5544/test_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-test_model_peace5544_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-test_model_peace5544_pipeline_en.md new file mode 100644 index 00000000000000..2bf2e2a1c38396 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-test_model_peace5544_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_model_peace5544_pipeline pipeline DistilBertForSequenceClassification from Peace5544 +author: John Snow Labs +name: test_model_peace5544_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_model_peace5544_pipeline` is a English model originally trained by Peace5544. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_model_peace5544_pipeline_en_5.5.0_3.0_1727110466791.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_model_peace5544_pipeline_en_5.5.0_3.0_1727110466791.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_model_peace5544_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_model_peace5544_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_model_peace5544_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Peace5544/test_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-text_classification_model_elijahriley_en.md b/docs/_posts/ahmedlone127/2024-09-23-text_classification_model_elijahriley_en.md new file mode 100644 index 00000000000000..916430ae97e3ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-text_classification_model_elijahriley_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English text_classification_model_elijahriley DistilBertForSequenceClassification from elijahriley +author: John Snow Labs +name: text_classification_model_elijahriley +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_classification_model_elijahriley` is a English model originally trained by elijahriley. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_classification_model_elijahriley_en_5.5.0_3.0_1727073736648.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_classification_model_elijahriley_en_5.5.0_3.0_1727073736648.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_classification_model_elijahriley","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_classification_model_elijahriley", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_classification_model_elijahriley| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/elijahriley/text_classification_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-text_entity_recognigtion_en.md b/docs/_posts/ahmedlone127/2024-09-23-text_entity_recognigtion_en.md new file mode 100644 index 00000000000000..cf2d1d6af391ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-text_entity_recognigtion_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English text_entity_recognigtion DistilBertForSequenceClassification from Mohamedfasil +author: John Snow Labs +name: text_entity_recognigtion +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_entity_recognigtion` is a English model originally trained by Mohamedfasil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_entity_recognigtion_en_5.5.0_3.0_1727108447911.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_entity_recognigtion_en_5.5.0_3.0_1727108447911.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_entity_recognigtion","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_entity_recognigtion", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_entity_recognigtion| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Mohamedfasil/text-entity-recognigtion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-text_entity_recognigtion_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-text_entity_recognigtion_pipeline_en.md new file mode 100644 index 00000000000000..2eaeb6947958f7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-text_entity_recognigtion_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English text_entity_recognigtion_pipeline pipeline DistilBertForSequenceClassification from Mohamedfasil +author: John Snow Labs +name: text_entity_recognigtion_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_entity_recognigtion_pipeline` is a English model originally trained by Mohamedfasil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_entity_recognigtion_pipeline_en_5.5.0_3.0_1727108459924.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_entity_recognigtion_pipeline_en_5.5.0_3.0_1727108459924.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("text_entity_recognigtion_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("text_entity_recognigtion_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_entity_recognigtion_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Mohamedfasil/text-entity-recognigtion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_en.md b/docs/_posts/ahmedlone127/2024-09-23-text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_en.md new file mode 100644 index 00000000000000..819bb011d0ff88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg DistilBertForSequenceClassification from acuvity +author: John Snow Labs +name: text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg` is a English model originally trained by acuvity. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_en_5.5.0_3.0_1727087011727.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_en_5.5.0_3.0_1727087011727.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/acuvity/text-subject_classification-distilbert-base-uncased-single_label-mgd_textbooks-zg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_pipeline_en.md new file mode 100644 index 00000000000000..d04c09e099c952 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_pipeline pipeline DistilBertForSequenceClassification from acuvity +author: John Snow Labs +name: text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_pipeline` is a English model originally trained by acuvity. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_pipeline_en_5.5.0_3.0_1727087023830.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_pipeline_en_5.5.0_3.0_1727087023830.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text_subject_classification_distilbert_base_uncased_single_label_mgd_textbooks_zg_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/acuvity/text-subject_classification-distilbert-base-uncased-single_label-mgd_textbooks-zg + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-topic_topic_random1_seed2_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-23-topic_topic_random1_seed2_bernice_en.md new file mode 100644 index 00000000000000..e16e2aecf39585 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-topic_topic_random1_seed2_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English topic_topic_random1_seed2_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_random1_seed2_bernice +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_random1_seed2_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_random1_seed2_bernice_en_5.5.0_3.0_1727088493478.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_random1_seed2_bernice_en_5.5.0_3.0_1727088493478.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("topic_topic_random1_seed2_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("topic_topic_random1_seed2_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_random1_seed2_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|805.6 MB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_random1_seed2-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-topic_topic_random1_seed2_bernice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-topic_topic_random1_seed2_bernice_pipeline_en.md new file mode 100644 index 00000000000000..466d4276ed507f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-topic_topic_random1_seed2_bernice_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English topic_topic_random1_seed2_bernice_pipeline pipeline XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_random1_seed2_bernice_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_random1_seed2_bernice_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_random1_seed2_bernice_pipeline_en_5.5.0_3.0_1727088633000.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_random1_seed2_bernice_pipeline_en_5.5.0_3.0_1727088633000.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("topic_topic_random1_seed2_bernice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("topic_topic_random1_seed2_bernice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_random1_seed2_bernice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|805.6 MB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_random1_seed2-bernice + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-trialz_en.md b/docs/_posts/ahmedlone127/2024-09-23-trialz_en.md new file mode 100644 index 00000000000000..c3dc75a74b0b87 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-trialz_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trialz RoBertaEmbeddings from JoAmps +author: John Snow Labs +name: trialz +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trialz` is a English model originally trained by JoAmps. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trialz_en_5.5.0_3.0_1727056713394.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trialz_en_5.5.0_3.0_1727056713394.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("trialz","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("trialz","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trialz| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.6 MB| + +## References + +https://huggingface.co/JoAmps/trialz \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-twitterfin_padding90model_en.md b/docs/_posts/ahmedlone127/2024-09-23-twitterfin_padding90model_en.md new file mode 100644 index 00000000000000..9c7674f68f3976 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-twitterfin_padding90model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English twitterfin_padding90model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: twitterfin_padding90model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitterfin_padding90model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitterfin_padding90model_en_5.5.0_3.0_1727074141408.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitterfin_padding90model_en_5.5.0_3.0_1727074141408.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("twitterfin_padding90model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("twitterfin_padding90model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitterfin_padding90model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/twitterfin_padding90model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-user_476da26872df492f830a65925d422651_model_pipeline_ja.md b/docs/_posts/ahmedlone127/2024-09-23-user_476da26872df492f830a65925d422651_model_pipeline_ja.md new file mode 100644 index 00000000000000..d05217b9b94359 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-user_476da26872df492f830a65925d422651_model_pipeline_ja.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Japanese user_476da26872df492f830a65925d422651_model_pipeline pipeline WhisperForCTC from hoangvanvietanh +author: John Snow Labs +name: user_476da26872df492f830a65925d422651_model_pipeline +date: 2024-09-23 +tags: [ja, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ja +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`user_476da26872df492f830a65925d422651_model_pipeline` is a Japanese model originally trained by hoangvanvietanh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/user_476da26872df492f830a65925d422651_model_pipeline_ja_5.5.0_3.0_1727076537830.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/user_476da26872df492f830a65925d422651_model_pipeline_ja_5.5.0_3.0_1727076537830.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("user_476da26872df492f830a65925d422651_model_pipeline", lang = "ja") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("user_476da26872df492f830a65925d422651_model_pipeline", lang = "ja") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|user_476da26872df492f830a65925d422651_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ja| +|Size:|1.7 GB| + +## References + +https://huggingface.co/hoangvanvietanh/user_476da26872df492f830a65925d422651_model + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-viz_wiz_bert_base_uncased_f16_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-viz_wiz_bert_base_uncased_f16_pipeline_en.md new file mode 100644 index 00000000000000..ed371f2a543486 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-viz_wiz_bert_base_uncased_f16_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English viz_wiz_bert_base_uncased_f16_pipeline pipeline BertEmbeddings from eisenjulian +author: John Snow Labs +name: viz_wiz_bert_base_uncased_f16_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`viz_wiz_bert_base_uncased_f16_pipeline` is a English model originally trained by eisenjulian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/viz_wiz_bert_base_uncased_f16_pipeline_en_5.5.0_3.0_1727107606457.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/viz_wiz_bert_base_uncased_f16_pipeline_en_5.5.0_3.0_1727107606457.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("viz_wiz_bert_base_uncased_f16_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("viz_wiz_bert_base_uncased_f16_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|viz_wiz_bert_base_uncased_f16_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/eisenjulian/viz-wiz-bert-base-uncased_f16 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whipser_small_r2_en.md b/docs/_posts/ahmedlone127/2024-09-23-whipser_small_r2_en.md new file mode 100644 index 00000000000000..8a3b998e8f77c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whipser_small_r2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whipser_small_r2 WhisperForCTC from spsither +author: John Snow Labs +name: whipser_small_r2 +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whipser_small_r2` is a English model originally trained by spsither. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whipser_small_r2_en_5.5.0_3.0_1727053944271.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whipser_small_r2_en_5.5.0_3.0_1727053944271.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whipser_small_r2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whipser_small_r2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whipser_small_r2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/spsither/whipser-small-r2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_6e_4_clean_legion_v2_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_6e_4_clean_legion_v2_en.md new file mode 100644 index 00000000000000..8024a843184316 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_6e_4_clean_legion_v2_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_6e_4_clean_legion_v2 WhisperForCTC from yusufagung29 +author: John Snow Labs +name: whisper_6e_4_clean_legion_v2 +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_6e_4_clean_legion_v2` is a English model originally trained by yusufagung29. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_6e_4_clean_legion_v2_en_5.5.0_3.0_1727076301568.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_6e_4_clean_legion_v2_en_5.5.0_3.0_1727076301568.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_6e_4_clean_legion_v2","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_6e_4_clean_legion_v2", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_6e_4_clean_legion_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/yusufagung29/whisper_6e-4_clean_legion_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_ai_nomi_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_ai_nomi_en.md new file mode 100644 index 00000000000000..178e8b06b0d450 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_ai_nomi_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_ai_nomi WhisperForCTC from susmitabhatt +author: John Snow Labs +name: whisper_ai_nomi +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_ai_nomi` is a English model originally trained by susmitabhatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_ai_nomi_en_5.5.0_3.0_1727117464662.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_ai_nomi_en_5.5.0_3.0_1727117464662.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_ai_nomi","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_ai_nomi", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_ai_nomi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/susmitabhatt/whisper-ai-nomi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_base_english_tonga_tonga_islands_myst55h_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_english_tonga_tonga_islands_myst55h_en.md new file mode 100644 index 00000000000000..da4ebdfa642f11 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_english_tonga_tonga_islands_myst55h_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_base_english_tonga_tonga_islands_myst55h WhisperForCTC from rishabhjain16 +author: John Snow Labs +name: whisper_base_english_tonga_tonga_islands_myst55h +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_english_tonga_tonga_islands_myst55h` is a English model originally trained by rishabhjain16. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_english_tonga_tonga_islands_myst55h_en_5.5.0_3.0_1727051397134.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_english_tonga_tonga_islands_myst55h_en_5.5.0_3.0_1727051397134.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_english_tonga_tonga_islands_myst55h","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_english_tonga_tonga_islands_myst55h", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_english_tonga_tonga_islands_myst55h| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|646.7 MB| + +## References + +https://huggingface.co/rishabhjain16/whisper_base_en_to_myst55h \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_base_english_tonga_tonga_islands_myst55h_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_english_tonga_tonga_islands_myst55h_pipeline_en.md new file mode 100644 index 00000000000000..13d865f7b7feb7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_english_tonga_tonga_islands_myst55h_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_english_tonga_tonga_islands_myst55h_pipeline pipeline WhisperForCTC from rishabhjain16 +author: John Snow Labs +name: whisper_base_english_tonga_tonga_islands_myst55h_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_english_tonga_tonga_islands_myst55h_pipeline` is a English model originally trained by rishabhjain16. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_english_tonga_tonga_islands_myst55h_pipeline_en_5.5.0_3.0_1727051433286.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_english_tonga_tonga_islands_myst55h_pipeline_en_5.5.0_3.0_1727051433286.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_english_tonga_tonga_islands_myst55h_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_english_tonga_tonga_islands_myst55h_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_english_tonga_tonga_islands_myst55h_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|646.7 MB| + +## References + +https://huggingface.co/rishabhjain16/whisper_base_en_to_myst55h + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_base_tamil_parambharat_pipeline_ta.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_tamil_parambharat_pipeline_ta.md new file mode 100644 index 00000000000000..8f534695b74d93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_tamil_parambharat_pipeline_ta.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Tamil whisper_base_tamil_parambharat_pipeline pipeline WhisperForCTC from parambharat +author: John Snow Labs +name: whisper_base_tamil_parambharat_pipeline +date: 2024-09-23 +tags: [ta, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ta +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_tamil_parambharat_pipeline` is a Tamil model originally trained by parambharat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_tamil_parambharat_pipeline_ta_5.5.0_3.0_1727052624577.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_tamil_parambharat_pipeline_ta_5.5.0_3.0_1727052624577.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_tamil_parambharat_pipeline", lang = "ta") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_tamil_parambharat_pipeline", lang = "ta") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_tamil_parambharat_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ta| +|Size:|643.5 MB| + +## References + +https://huggingface.co/parambharat/whisper-base-ta + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_base_tamil_parambharat_ta.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_tamil_parambharat_ta.md new file mode 100644 index 00000000000000..db4d66840c346a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_tamil_parambharat_ta.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Tamil whisper_base_tamil_parambharat WhisperForCTC from parambharat +author: John Snow Labs +name: whisper_base_tamil_parambharat +date: 2024-09-23 +tags: [ta, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ta +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_tamil_parambharat` is a Tamil model originally trained by parambharat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_tamil_parambharat_ta_5.5.0_3.0_1727052588661.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_tamil_parambharat_ta_5.5.0_3.0_1727052588661.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_tamil_parambharat","ta") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_tamil_parambharat", "ta") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_tamil_parambharat| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ta| +|Size:|643.5 MB| + +## References + +https://huggingface.co/parambharat/whisper-base-ta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_base_thai_der_1_th.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_thai_der_1_th.md new file mode 100644 index 00000000000000..296980ad059a55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_thai_der_1_th.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Thai whisper_base_thai_der_1 WhisperForCTC from arun100 +author: John Snow Labs +name: whisper_base_thai_der_1 +date: 2024-09-23 +tags: [th, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: th +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_thai_der_1` is a Thai model originally trained by arun100. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_thai_der_1_th_5.5.0_3.0_1727077814185.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_thai_der_1_th_5.5.0_3.0_1727077814185.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_base_thai_der_1","th") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_base_thai_der_1", "th") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_thai_der_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|th| +|Size:|642.6 MB| + +## References + +https://huggingface.co/arun100/whisper-base-thai-der-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_cantonese_cm_voice_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_cantonese_cm_voice_en.md new file mode 100644 index 00000000000000..7b6630754d8538 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_cantonese_cm_voice_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_medium_cantonese_cm_voice WhisperForCTC from jed351 +author: John Snow Labs +name: whisper_medium_cantonese_cm_voice +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_cantonese_cm_voice` is a English model originally trained by jed351. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_cantonese_cm_voice_en_5.5.0_3.0_1727078780314.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_cantonese_cm_voice_en_5.5.0_3.0_1727078780314.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_medium_cantonese_cm_voice","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_medium_cantonese_cm_voice", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_cantonese_cm_voice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/jed351/whisper_medium_cantonese_cm_voice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_malay_augmented_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_malay_augmented_en.md new file mode 100644 index 00000000000000..a656ba665fe8b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_malay_augmented_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_medium_malay_augmented WhisperForCTC from Scrya +author: John Snow Labs +name: whisper_medium_malay_augmented +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_malay_augmented` is a English model originally trained by Scrya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_malay_augmented_en_5.5.0_3.0_1727054306177.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_malay_augmented_en_5.5.0_3.0_1727054306177.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_medium_malay_augmented","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_medium_malay_augmented", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_malay_augmented| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/Scrya/whisper-medium-ms-augmented \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_portuguese_cv16_fleurs2_lr_wu_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_portuguese_cv16_fleurs2_lr_wu_en.md new file mode 100644 index 00000000000000..2c0540aada703f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_medium_portuguese_cv16_fleurs2_lr_wu_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_medium_portuguese_cv16_fleurs2_lr_wu WhisperForCTC from fsicoli +author: John Snow Labs +name: whisper_medium_portuguese_cv16_fleurs2_lr_wu +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_portuguese_cv16_fleurs2_lr_wu` is a English model originally trained by fsicoli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_portuguese_cv16_fleurs2_lr_wu_en_5.5.0_3.0_1727079959850.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_portuguese_cv16_fleurs2_lr_wu_en_5.5.0_3.0_1727079959850.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_medium_portuguese_cv16_fleurs2_lr_wu","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_medium_portuguese_cv16_fleurs2_lr_wu", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_portuguese_cv16_fleurs2_lr_wu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/fsicoli/whisper-medium-pt-cv16-fleurs2-lr-wu \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_ar2_ar.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_ar2_ar.md new file mode 100644 index 00000000000000..ba1fc3f4023bd6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_ar2_ar.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Arabic whisper_small_ar2 WhisperForCTC from whitefox123 +author: John Snow Labs +name: whisper_small_ar2 +date: 2024-09-23 +tags: [ar, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_ar2` is a Arabic model originally trained by whitefox123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_ar2_ar_5.5.0_3.0_1727119299746.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_ar2_ar_5.5.0_3.0_1727119299746.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_ar2","ar") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_ar2", "ar") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_ar2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/whitefox123/whisper-small-ar2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_arabic_kecil_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_arabic_kecil_en.md new file mode 100644 index 00000000000000..2704a0664fe06e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_arabic_kecil_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_arabic_kecil WhisperForCTC from mujadid-syahbana +author: John Snow Labs +name: whisper_small_arabic_kecil +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_kecil` is a English model originally trained by mujadid-syahbana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_kecil_en_5.5.0_3.0_1727076598671.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_kecil_en_5.5.0_3.0_1727076598671.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_arabic_kecil","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_arabic_kecil", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_kecil| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.6 MB| + +## References + +https://huggingface.co/mujadid-syahbana/whisper-small-ar-kecil \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_arabic_kecil_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_arabic_kecil_pipeline_en.md new file mode 100644 index 00000000000000..6769a47d69fc89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_arabic_kecil_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_arabic_kecil_pipeline pipeline WhisperForCTC from mujadid-syahbana +author: John Snow Labs +name: whisper_small_arabic_kecil_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_kecil_pipeline` is a English model originally trained by mujadid-syahbana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_kecil_pipeline_en_5.5.0_3.0_1727076618022.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_kecil_pipeline_en_5.5.0_3.0_1727076618022.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_arabic_kecil_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_arabic_kecil_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_kecil_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.6 MB| + +## References + +https://huggingface.co/mujadid-syahbana/whisper-small-ar-kecil + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_arabic_v2_ar.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_arabic_v2_ar.md new file mode 100644 index 00000000000000..64af8f0b360f2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_arabic_v2_ar.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Arabic whisper_small_arabic_v2 WhisperForCTC from ymoslem +author: John Snow Labs +name: whisper_small_arabic_v2 +date: 2024-09-23 +tags: [ar, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_v2` is a Arabic model originally trained by ymoslem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_v2_ar_5.5.0_3.0_1727051664606.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_v2_ar_5.5.0_3.0_1727051664606.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_arabic_v2","ar") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_arabic_v2", "ar") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ymoslem/whisper-small-ar-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_arabic_v2_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_arabic_v2_pipeline_ar.md new file mode 100644 index 00000000000000..02c3dbf5a36926 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_arabic_v2_pipeline_ar.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Arabic whisper_small_arabic_v2_pipeline pipeline WhisperForCTC from ymoslem +author: John Snow Labs +name: whisper_small_arabic_v2_pipeline +date: 2024-09-23 +tags: [ar, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_arabic_v2_pipeline` is a Arabic model originally trained by ymoslem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_v2_pipeline_ar_5.5.0_3.0_1727051748920.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_arabic_v2_pipeline_ar_5.5.0_3.0_1727051748920.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_arabic_v2_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_arabic_v2_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_arabic_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ymoslem/whisper-small-ar-v2 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_bangla_pipeline_bn.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_bangla_pipeline_bn.md new file mode 100644 index 00000000000000..85787898720740 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_bangla_pipeline_bn.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Bengali whisper_small_bangla_pipeline pipeline WhisperForCTC from ashrafulparan +author: John Snow Labs +name: whisper_small_bangla_pipeline +date: 2024-09-23 +tags: [bn, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_bangla_pipeline` is a Bengali model originally trained by ashrafulparan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_bangla_pipeline_bn_5.5.0_3.0_1727051140208.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_bangla_pipeline_bn_5.5.0_3.0_1727051140208.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_bangla_pipeline", lang = "bn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_bangla_pipeline", lang = "bn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_bangla_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ashrafulparan/whisper-small-bangla + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_basque_xezpeleta_eu.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_basque_xezpeleta_eu.md new file mode 100644 index 00000000000000..d12d07f372432d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_basque_xezpeleta_eu.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Basque whisper_small_basque_xezpeleta WhisperForCTC from xezpeleta +author: John Snow Labs +name: whisper_small_basque_xezpeleta +date: 2024-09-23 +tags: [eu, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: eu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_basque_xezpeleta` is a Basque model originally trained by xezpeleta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_basque_xezpeleta_eu_5.5.0_3.0_1727118123821.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_basque_xezpeleta_eu_5.5.0_3.0_1727118123821.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_basque_xezpeleta","eu") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_basque_xezpeleta", "eu") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_basque_xezpeleta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|eu| +|Size:|1.7 GB| + +## References + +https://huggingface.co/xezpeleta/whisper-small-eu \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_basque_xezpeleta_pipeline_eu.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_basque_xezpeleta_pipeline_eu.md new file mode 100644 index 00000000000000..c2757750e74b68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_basque_xezpeleta_pipeline_eu.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Basque whisper_small_basque_xezpeleta_pipeline pipeline WhisperForCTC from xezpeleta +author: John Snow Labs +name: whisper_small_basque_xezpeleta_pipeline +date: 2024-09-23 +tags: [eu, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: eu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_basque_xezpeleta_pipeline` is a Basque model originally trained by xezpeleta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_basque_xezpeleta_pipeline_eu_5.5.0_3.0_1727118205773.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_basque_xezpeleta_pipeline_eu_5.5.0_3.0_1727118205773.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_basque_xezpeleta_pipeline", lang = "eu") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_basque_xezpeleta_pipeline", lang = "eu") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_basque_xezpeleta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|eu| +|Size:|1.7 GB| + +## References + +https://huggingface.co/xezpeleta/whisper-small-eu + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_chinese_cn_zh.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_chinese_cn_zh.md new file mode 100644 index 00000000000000..96c19fcf28fe38 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_chinese_cn_zh.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Chinese whisper_small_chinese_cn WhisperForCTC from JunSir +author: John Snow Labs +name: whisper_small_chinese_cn +date: 2024-09-23 +tags: [zh, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_chinese_cn` is a Chinese model originally trained by JunSir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_chinese_cn_zh_5.5.0_3.0_1727053083927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_chinese_cn_zh_5.5.0_3.0_1727053083927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_chinese_cn","zh") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_chinese_cn", "zh") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_chinese_cn| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|zh| +|Size:|1.7 GB| + +## References + +https://huggingface.co/JunSir/whisper-small-zh-CN \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_danish_wasurats_da.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_danish_wasurats_da.md new file mode 100644 index 00000000000000..9ce5aa48c20a8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_danish_wasurats_da.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Danish whisper_small_danish_wasurats WhisperForCTC from WasuratS +author: John Snow Labs +name: whisper_small_danish_wasurats +date: 2024-09-23 +tags: [da, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: da +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_danish_wasurats` is a Danish model originally trained by WasuratS. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_danish_wasurats_da_5.5.0_3.0_1727075824331.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_danish_wasurats_da_5.5.0_3.0_1727075824331.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_danish_wasurats","da") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_danish_wasurats", "da") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_danish_wasurats| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|da| +|Size:|1.7 GB| + +## References + +https://huggingface.co/WasuratS/whisper-small-da \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_danish_wasurats_pipeline_da.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_danish_wasurats_pipeline_da.md new file mode 100644 index 00000000000000..a653f620c628ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_danish_wasurats_pipeline_da.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Danish whisper_small_danish_wasurats_pipeline pipeline WhisperForCTC from WasuratS +author: John Snow Labs +name: whisper_small_danish_wasurats_pipeline +date: 2024-09-23 +tags: [da, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: da +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_danish_wasurats_pipeline` is a Danish model originally trained by WasuratS. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_danish_wasurats_pipeline_da_5.5.0_3.0_1727075909323.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_danish_wasurats_pipeline_da_5.5.0_3.0_1727075909323.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_danish_wasurats_pipeline", lang = "da") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_danish_wasurats_pipeline", lang = "da") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_danish_wasurats_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|da| +|Size:|1.7 GB| + +## References + +https://huggingface.co/WasuratS/whisper-small-da + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_divehi_shahukareem_pipeline_dv.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_divehi_shahukareem_pipeline_dv.md new file mode 100644 index 00000000000000..b78371767deb43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_divehi_shahukareem_pipeline_dv.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Dhivehi, Divehi, Maldivian whisper_small_divehi_shahukareem_pipeline pipeline WhisperForCTC from shahukareem +author: John Snow Labs +name: whisper_small_divehi_shahukareem_pipeline +date: 2024-09-23 +tags: [dv, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: dv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_divehi_shahukareem_pipeline` is a Dhivehi, Divehi, Maldivian model originally trained by shahukareem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_shahukareem_pipeline_dv_5.5.0_3.0_1727117112946.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_divehi_shahukareem_pipeline_dv_5.5.0_3.0_1727117112946.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_divehi_shahukareem_pipeline", lang = "dv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_divehi_shahukareem_pipeline", lang = "dv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_divehi_shahukareem_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|dv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/shahukareem/whisper-small-dv + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_hindi_auro_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_hindi_auro_pipeline_hi.md new file mode 100644 index 00000000000000..910dfddff375f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_hindi_auro_pipeline_hi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hindi whisper_small_hindi_auro_pipeline pipeline WhisperForCTC from auro +author: John Snow Labs +name: whisper_small_hindi_auro_pipeline +date: 2024-09-23 +tags: [hi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_auro_pipeline` is a Hindi model originally trained by auro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_auro_pipeline_hi_5.5.0_3.0_1727078576756.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_auro_pipeline_hi_5.5.0_3.0_1727078576756.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_auro_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_auro_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_auro_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/auro/whisper-small-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_hindi_reach_vb_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_hindi_reach_vb_en.md new file mode 100644 index 00000000000000..47ff0168b394c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_hindi_reach_vb_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_hindi_reach_vb WhisperForCTC from reach-vb +author: John Snow Labs +name: whisper_small_hindi_reach_vb +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_reach_vb` is a English model originally trained by reach-vb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_reach_vb_en_5.5.0_3.0_1727116374943.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_reach_vb_en_5.5.0_3.0_1727116374943.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_reach_vb","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_reach_vb", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_reach_vb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/reach-vb/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_hindi_reach_vb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_hindi_reach_vb_pipeline_en.md new file mode 100644 index 00000000000000..7000601151cc12 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_hindi_reach_vb_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_hindi_reach_vb_pipeline pipeline WhisperForCTC from reach-vb +author: John Snow Labs +name: whisper_small_hindi_reach_vb_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_reach_vb_pipeline` is a English model originally trained by reach-vb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_reach_vb_pipeline_en_5.5.0_3.0_1727116465843.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_reach_vb_pipeline_en_5.5.0_3.0_1727116465843.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_reach_vb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_reach_vb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_reach_vb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/reach-vb/whisper-small-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_kdn_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_kdn_en.md new file mode 100644 index 00000000000000..f48e7e7f2cd81c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_kdn_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_kdn WhisperForCTC from susmitabhatt +author: John Snow Labs +name: whisper_small_kdn +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_kdn` is a English model originally trained by susmitabhatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_kdn_en_5.5.0_3.0_1727052386799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_kdn_en_5.5.0_3.0_1727052386799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_kdn","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_kdn", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_kdn| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/susmitabhatt/whisper-small-kdn \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_mongolian_11_pipeline_mn.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_mongolian_11_pipeline_mn.md new file mode 100644 index 00000000000000..be6d68fee37a77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_mongolian_11_pipeline_mn.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Mongolian whisper_small_mongolian_11_pipeline pipeline WhisperForCTC from bayartsogt +author: John Snow Labs +name: whisper_small_mongolian_11_pipeline +date: 2024-09-23 +tags: [mn, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_mongolian_11_pipeline` is a Mongolian model originally trained by bayartsogt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_mongolian_11_pipeline_mn_5.5.0_3.0_1727054037542.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_mongolian_11_pipeline_mn_5.5.0_3.0_1727054037542.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_mongolian_11_pipeline", lang = "mn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_mongolian_11_pipeline", lang = "mn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_mongolian_11_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/bayartsogt/whisper-small-mn-11 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_mongolian_cafet_mn.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_mongolian_cafet_mn.md new file mode 100644 index 00000000000000..c55df9efe1286d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_mongolian_cafet_mn.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Mongolian whisper_small_mongolian_cafet WhisperForCTC from Cafet +author: John Snow Labs +name: whisper_small_mongolian_cafet +date: 2024-09-23 +tags: [mn, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_mongolian_cafet` is a Mongolian model originally trained by Cafet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_mongolian_cafet_mn_5.5.0_3.0_1727118411098.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_mongolian_cafet_mn_5.5.0_3.0_1727118411098.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_mongolian_cafet","mn") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_mongolian_cafet", "mn") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_mongolian_cafet| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|mn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Cafet/whisper-small-mongolian \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_mongolian_cafet_pipeline_mn.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_mongolian_cafet_pipeline_mn.md new file mode 100644 index 00000000000000..3e7079cc0de108 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_mongolian_cafet_pipeline_mn.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Mongolian whisper_small_mongolian_cafet_pipeline pipeline WhisperForCTC from Cafet +author: John Snow Labs +name: whisper_small_mongolian_cafet_pipeline +date: 2024-09-23 +tags: [mn, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: mn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_mongolian_cafet_pipeline` is a Mongolian model originally trained by Cafet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_mongolian_cafet_pipeline_mn_5.5.0_3.0_1727118496606.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_mongolian_cafet_pipeline_mn_5.5.0_3.0_1727118496606.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_mongolian_cafet_pipeline", lang = "mn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_mongolian_cafet_pipeline", lang = "mn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_mongolian_cafet_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mn| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Cafet/whisper-small-mongolian + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_nepal_bhasa_hindi_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_nepal_bhasa_hindi_en.md new file mode 100644 index 00000000000000..3ec9ee25158184 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_nepal_bhasa_hindi_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_nepal_bhasa_hindi WhisperForCTC from RamNaamSatyaHai +author: John Snow Labs +name: whisper_small_nepal_bhasa_hindi +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_nepal_bhasa_hindi` is a English model originally trained by RamNaamSatyaHai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_nepal_bhasa_hindi_en_5.5.0_3.0_1727075925615.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_nepal_bhasa_hindi_en_5.5.0_3.0_1727075925615.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_nepal_bhasa_hindi","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_nepal_bhasa_hindi", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_nepal_bhasa_hindi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/RamNaamSatyaHai/whisper-small_new-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_nepal_bhasa_hindi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_nepal_bhasa_hindi_pipeline_en.md new file mode 100644 index 00000000000000..3911b7dbd9650d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_nepal_bhasa_hindi_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_nepal_bhasa_hindi_pipeline pipeline WhisperForCTC from RamNaamSatyaHai +author: John Snow Labs +name: whisper_small_nepal_bhasa_hindi_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_nepal_bhasa_hindi_pipeline` is a English model originally trained by RamNaamSatyaHai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_nepal_bhasa_hindi_pipeline_en_5.5.0_3.0_1727076015404.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_nepal_bhasa_hindi_pipeline_en_5.5.0_3.0_1727076015404.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_nepal_bhasa_hindi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_nepal_bhasa_hindi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_nepal_bhasa_hindi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/RamNaamSatyaHai/whisper-small_new-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_np_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_np_en.md new file mode 100644 index 00000000000000..5436850458d08f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_np_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_np WhisperForCTC from bhimrazy +author: John Snow Labs +name: whisper_small_np +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_np` is a English model originally trained by bhimrazy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_np_en_5.5.0_3.0_1727116601657.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_np_en_5.5.0_3.0_1727116601657.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_np","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_np", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_np| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/bhimrazy/whisper-small-np \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_np_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_np_pipeline_en.md new file mode 100644 index 00000000000000..ab6b4664760db0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_np_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_np_pipeline pipeline WhisperForCTC from bhimrazy +author: John Snow Labs +name: whisper_small_np_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_np_pipeline` is a English model originally trained by bhimrazy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_np_pipeline_en_5.5.0_3.0_1727116687886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_np_pipeline_en_5.5.0_3.0_1727116687886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_np_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_np_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_np_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/bhimrazy/whisper-small-np + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_romanian_yehoward_ro.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_romanian_yehoward_ro.md new file mode 100644 index 00000000000000..d3b0a43585c36a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_romanian_yehoward_ro.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Moldavian, Moldovan, Romanian whisper_small_romanian_yehoward WhisperForCTC from Yehoward +author: John Snow Labs +name: whisper_small_romanian_yehoward +date: 2024-09-23 +tags: [ro, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ro +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_romanian_yehoward` is a Moldavian, Moldovan, Romanian model originally trained by Yehoward. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_romanian_yehoward_ro_5.5.0_3.0_1727079009473.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_romanian_yehoward_ro_5.5.0_3.0_1727079009473.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_romanian_yehoward","ro") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_romanian_yehoward", "ro") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_romanian_yehoward| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ro| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Yehoward/whisper-small-ro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_russian_ord_4_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_russian_ord_4_en.md new file mode 100644 index 00000000000000..689d90d1365a99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_russian_ord_4_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_russian_ord_4 WhisperForCTC from mizoru +author: John Snow Labs +name: whisper_small_russian_ord_4 +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_russian_ord_4` is a English model originally trained by mizoru. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_russian_ord_4_en_5.5.0_3.0_1727051982000.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_russian_ord_4_en_5.5.0_3.0_1727051982000.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_russian_ord_4","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_russian_ord_4", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_russian_ord_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/mizoru/whisper-small-ru-ORD_4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_russian_ord_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_russian_ord_4_pipeline_en.md new file mode 100644 index 00000000000000..39129523929de8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_russian_ord_4_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_russian_ord_4_pipeline pipeline WhisperForCTC from mizoru +author: John Snow Labs +name: whisper_small_russian_ord_4_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_russian_ord_4_pipeline` is a English model originally trained by mizoru. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_russian_ord_4_pipeline_en_5.5.0_3.0_1727052078189.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_russian_ord_4_pipeline_en_5.5.0_3.0_1727052078189.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_russian_ord_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_russian_ord_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_russian_ord_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/mizoru/whisper-small-ru-ORD_4 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_saudi_podcasts_asr_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_saudi_podcasts_asr_en.md new file mode 100644 index 00000000000000..6ead3048eaf9b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_saudi_podcasts_asr_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_saudi_podcasts_asr WhisperForCTC from HuggingPanda +author: John Snow Labs +name: whisper_small_saudi_podcasts_asr +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_saudi_podcasts_asr` is a English model originally trained by HuggingPanda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_saudi_podcasts_asr_en_5.5.0_3.0_1727079199652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_saudi_podcasts_asr_en_5.5.0_3.0_1727079199652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_saudi_podcasts_asr","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_saudi_podcasts_asr", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_saudi_podcasts_asr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/HuggingPanda/whisper-small-Saudi-Podcasts-ASR \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_tamil_carlfeynman_pipeline_ta.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_tamil_carlfeynman_pipeline_ta.md new file mode 100644 index 00000000000000..3268c96f5820ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_tamil_carlfeynman_pipeline_ta.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Tamil whisper_small_tamil_carlfeynman_pipeline pipeline WhisperForCTC from carlfeynman +author: John Snow Labs +name: whisper_small_tamil_carlfeynman_pipeline +date: 2024-09-23 +tags: [ta, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ta +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_tamil_carlfeynman_pipeline` is a Tamil model originally trained by carlfeynman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_tamil_carlfeynman_pipeline_ta_5.5.0_3.0_1727077144726.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_tamil_carlfeynman_pipeline_ta_5.5.0_3.0_1727077144726.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_tamil_carlfeynman_pipeline", lang = "ta") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_tamil_carlfeynman_pipeline", lang = "ta") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_tamil_carlfeynman_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ta| +|Size:|1.7 GB| + +## References + +https://huggingface.co/carlfeynman/whisper-small-tamil + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_tamil_carlfeynman_ta.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_tamil_carlfeynman_ta.md new file mode 100644 index 00000000000000..976a862258eab0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_tamil_carlfeynman_ta.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Tamil whisper_small_tamil_carlfeynman WhisperForCTC from carlfeynman +author: John Snow Labs +name: whisper_small_tamil_carlfeynman +date: 2024-09-23 +tags: [ta, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ta +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_tamil_carlfeynman` is a Tamil model originally trained by carlfeynman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_tamil_carlfeynman_ta_5.5.0_3.0_1727077063936.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_tamil_carlfeynman_ta_5.5.0_3.0_1727077063936.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_tamil_carlfeynman","ta") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_tamil_carlfeynman", "ta") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_tamil_carlfeynman| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ta| +|Size:|1.7 GB| + +## References + +https://huggingface.co/carlfeynman/whisper-small-tamil \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_tonga_tonga_islands_chuvash_albanian_sq.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_tonga_tonga_islands_chuvash_albanian_sq.md new file mode 100644 index 00000000000000..8bbeee9d48e8ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_tonga_tonga_islands_chuvash_albanian_sq.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Albanian whisper_small_tonga_tonga_islands_chuvash_albanian WhisperForCTC from rishabhjain16 +author: John Snow Labs +name: whisper_small_tonga_tonga_islands_chuvash_albanian +date: 2024-09-23 +tags: [sq, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: sq +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_tonga_tonga_islands_chuvash_albanian` is a Albanian model originally trained by rishabhjain16. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_tonga_tonga_islands_chuvash_albanian_sq_5.5.0_3.0_1727117888841.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_tonga_tonga_islands_chuvash_albanian_sq_5.5.0_3.0_1727117888841.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_tonga_tonga_islands_chuvash_albanian","sq") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_tonga_tonga_islands_chuvash_albanian", "sq") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_tonga_tonga_islands_chuvash_albanian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|sq| +|Size:|1.7 GB| + +## References + +https://huggingface.co/rishabhjain16/whisper-small_to_cv_albanian \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_synthesized_turkish_8_hour_hlr_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_synthesized_turkish_8_hour_hlr_en.md new file mode 100644 index 00000000000000..f8dc886ac5bc46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_synthesized_turkish_8_hour_hlr_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_synthesized_turkish_8_hour_hlr WhisperForCTC from alikanakar +author: John Snow Labs +name: whisper_synthesized_turkish_8_hour_hlr +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_synthesized_turkish_8_hour_hlr` is a English model originally trained by alikanakar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_synthesized_turkish_8_hour_hlr_en_5.5.0_3.0_1727079416083.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_synthesized_turkish_8_hour_hlr_en_5.5.0_3.0_1727079416083.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_synthesized_turkish_8_hour_hlr","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_synthesized_turkish_8_hour_hlr", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_synthesized_turkish_8_hour_hlr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/alikanakar/whisper-synthesized-turkish-8-hour-hlr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_synthesized_turkish_8_hour_hlr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_synthesized_turkish_8_hour_hlr_pipeline_en.md new file mode 100644 index 00000000000000..12c3218f54b2a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_synthesized_turkish_8_hour_hlr_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_synthesized_turkish_8_hour_hlr_pipeline pipeline WhisperForCTC from alikanakar +author: John Snow Labs +name: whisper_synthesized_turkish_8_hour_hlr_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_synthesized_turkish_8_hour_hlr_pipeline` is a English model originally trained by alikanakar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_synthesized_turkish_8_hour_hlr_pipeline_en_5.5.0_3.0_1727079496940.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_synthesized_turkish_8_hour_hlr_pipeline_en_5.5.0_3.0_1727079496940.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_synthesized_turkish_8_hour_hlr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_synthesized_turkish_8_hour_hlr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_synthesized_turkish_8_hour_hlr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/alikanakar/whisper-synthesized-turkish-8-hour-hlr + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_aug_on_fly_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_aug_on_fly_en.md new file mode 100644 index 00000000000000..c150301b8f6753 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_aug_on_fly_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_aug_on_fly WhisperForCTC from thanhduycao +author: John Snow Labs +name: whisper_tiny_aug_on_fly +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_aug_on_fly` is a English model originally trained by thanhduycao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_aug_on_fly_en_5.5.0_3.0_1727116764205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_aug_on_fly_en_5.5.0_3.0_1727116764205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_aug_on_fly","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_aug_on_fly", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_aug_on_fly| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|379.0 MB| + +## References + +https://huggingface.co/thanhduycao/whisper-tiny-aug-on-fly \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_aug_on_fly_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_aug_on_fly_pipeline_en.md new file mode 100644 index 00000000000000..db120b08a8d3c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_aug_on_fly_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_aug_on_fly_pipeline pipeline WhisperForCTC from thanhduycao +author: John Snow Labs +name: whisper_tiny_aug_on_fly_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_aug_on_fly_pipeline` is a English model originally trained by thanhduycao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_aug_on_fly_pipeline_en_5.5.0_3.0_1727116789679.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_aug_on_fly_pipeline_en_5.5.0_3.0_1727116789679.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_aug_on_fly_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_aug_on_fly_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_aug_on_fly_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|379.0 MB| + +## References + +https://huggingface.co/thanhduycao/whisper-tiny-aug-on-fly + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_cb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_cb_pipeline_en.md new file mode 100644 index 00000000000000..bfd527914ba230 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_cb_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_cb_pipeline pipeline WhisperForCTC from mikemason +author: John Snow Labs +name: whisper_tiny_cb_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_cb_pipeline` is a English model originally trained by mikemason. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_cb_pipeline_en_5.5.0_3.0_1727077622726.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_cb_pipeline_en_5.5.0_3.0_1727077622726.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_cb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_cb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_cb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|394.9 MB| + +## References + +https://huggingface.co/mikemason/whisper-tiny-cb + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_ft_tts_english_welsh_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_ft_tts_english_welsh_en.md new file mode 100644 index 00000000000000..0e08d416b8583a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_ft_tts_english_welsh_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_ft_tts_english_welsh WhisperForCTC from DewiBrynJones +author: John Snow Labs +name: whisper_tiny_ft_tts_english_welsh +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_ft_tts_english_welsh` is a English model originally trained by DewiBrynJones. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_ft_tts_english_welsh_en_5.5.0_3.0_1727079068885.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_ft_tts_english_welsh_en_5.5.0_3.0_1727079068885.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_ft_tts_english_welsh","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_ft_tts_english_welsh", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_ft_tts_english_welsh| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|389.8 MB| + +## References + +https://huggingface.co/DewiBrynJones/whisper-tiny-ft-tts-en-cy \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_ft_tts_english_welsh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_ft_tts_english_welsh_pipeline_en.md new file mode 100644 index 00000000000000..e141bc659d805a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_ft_tts_english_welsh_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_ft_tts_english_welsh_pipeline pipeline WhisperForCTC from DewiBrynJones +author: John Snow Labs +name: whisper_tiny_ft_tts_english_welsh_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_ft_tts_english_welsh_pipeline` is a English model originally trained by DewiBrynJones. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_ft_tts_english_welsh_pipeline_en_5.5.0_3.0_1727079089735.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_ft_tts_english_welsh_pipeline_en_5.5.0_3.0_1727079089735.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_ft_tts_english_welsh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_ft_tts_english_welsh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_ft_tts_english_welsh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.8 MB| + +## References + +https://huggingface.co/DewiBrynJones/whisper-tiny-ft-tts-en-cy + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_hebrew_modern_2_he.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_hebrew_modern_2_he.md new file mode 100644 index 00000000000000..070c8301888742 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_hebrew_modern_2_he.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hebrew whisper_tiny_hebrew_modern_2 WhisperForCTC from NS-Y +author: John Snow Labs +name: whisper_tiny_hebrew_modern_2 +date: 2024-09-23 +tags: [he, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: he +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_hebrew_modern_2` is a Hebrew model originally trained by NS-Y. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_hebrew_modern_2_he_5.5.0_3.0_1727117873987.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_hebrew_modern_2_he_5.5.0_3.0_1727117873987.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_hebrew_modern_2","he") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_hebrew_modern_2", "he") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_hebrew_modern_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|he| +|Size:|242.8 MB| + +## References + +https://huggingface.co/NS-Y/whisper-tiny-he-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_minds14_english_us_b_koopman_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_minds14_english_us_b_koopman_en.md new file mode 100644 index 00000000000000..bfcfc6958bf703 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_minds14_english_us_b_koopman_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_minds14_english_us_b_koopman WhisperForCTC from b-koopman +author: John Snow Labs +name: whisper_tiny_minds14_english_us_b_koopman +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_english_us_b_koopman` is a English model originally trained by b-koopman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_us_b_koopman_en_5.5.0_3.0_1727051409895.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_us_b_koopman_en_5.5.0_3.0_1727051409895.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_english_us_b_koopman","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_minds14_english_us_b_koopman", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_english_us_b_koopman| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/b-koopman/whisper-tiny-minds14-en-US \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_minds14_sjdata_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_minds14_sjdata_pipeline_en.md new file mode 100644 index 00000000000000..e003c7623b2018 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_minds14_sjdata_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_minds14_sjdata_pipeline pipeline WhisperForCTC from sjdata +author: John Snow Labs +name: whisper_tiny_minds14_sjdata_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_sjdata_pipeline` is a English model originally trained by sjdata. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_sjdata_pipeline_en_5.5.0_3.0_1727076909305.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_sjdata_pipeline_en_5.5.0_3.0_1727076909305.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_minds14_sjdata_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_minds14_sjdata_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_sjdata_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/sjdata/whisper-tiny-minds14 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whispercheckpoints3_sv.md b/docs/_posts/ahmedlone127/2024-09-23-whispercheckpoints3_sv.md new file mode 100644 index 00000000000000..e9e54b4ffa0a5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whispercheckpoints3_sv.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Swedish whispercheckpoints3 WhisperForCTC from Yulle +author: John Snow Labs +name: whispercheckpoints3 +date: 2024-09-23 +tags: [sv, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: sv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whispercheckpoints3` is a Swedish model originally trained by Yulle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whispercheckpoints3_sv_5.5.0_3.0_1727053029604.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whispercheckpoints3_sv_5.5.0_3.0_1727053029604.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whispercheckpoints3","sv") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whispercheckpoints3", "sv") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whispercheckpoints3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|sv| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Yulle/WhisperCheckpoints3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-withinapps_ndd_mantisbt_test_tags_en.md b/docs/_posts/ahmedlone127/2024-09-23-withinapps_ndd_mantisbt_test_tags_en.md new file mode 100644 index 00000000000000..9b6b4cf4a81cc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-withinapps_ndd_mantisbt_test_tags_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English withinapps_ndd_mantisbt_test_tags DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_mantisbt_test_tags +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_mantisbt_test_tags` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mantisbt_test_tags_en_5.5.0_3.0_1727108288657.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mantisbt_test_tags_en_5.5.0_3.0_1727108288657.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_mantisbt_test_tags","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_mantisbt_test_tags", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_mantisbt_test_tags| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-mantisbt_test-tags \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-wolof_maskedmodel_en.md b/docs/_posts/ahmedlone127/2024-09-23-wolof_maskedmodel_en.md new file mode 100644 index 00000000000000..ecc4cf6b9a7bcc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-wolof_maskedmodel_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English wolof_maskedmodel RoBertaEmbeddings from gjonesQ02 +author: John Snow Labs +name: wolof_maskedmodel +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wolof_maskedmodel` is a English model originally trained by gjonesQ02. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wolof_maskedmodel_en_5.5.0_3.0_1727080695069.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wolof_maskedmodel_en_5.5.0_3.0_1727080695069.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("wolof_maskedmodel","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("wolof_maskedmodel","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wolof_maskedmodel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/gjonesQ02/WO_MaskedModel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-wolof_maskedmodel_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-wolof_maskedmodel_pipeline_en.md new file mode 100644 index 00000000000000..fd25b10a4f5fbd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-wolof_maskedmodel_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English wolof_maskedmodel_pipeline pipeline RoBertaEmbeddings from gjonesQ02 +author: John Snow Labs +name: wolof_maskedmodel_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wolof_maskedmodel_pipeline` is a English model originally trained by gjonesQ02. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wolof_maskedmodel_pipeline_en_5.5.0_3.0_1727080710145.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wolof_maskedmodel_pipeline_en_5.5.0_3.0_1727080710145.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("wolof_maskedmodel_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("wolof_maskedmodel_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wolof_maskedmodel_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/gjonesQ02/WO_MaskedModel + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_r_galen_livingner3_es.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_r_galen_livingner3_es.md new file mode 100644 index 00000000000000..fde18acd4c0c84 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_r_galen_livingner3_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish xlm_r_galen_livingner3 XlmRoBertaForSequenceClassification from IIC +author: John Snow Labs +name: xlm_r_galen_livingner3 +date: 2024-09-23 +tags: [es, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_r_galen_livingner3` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_r_galen_livingner3_es_5.5.0_3.0_1727099436320.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_r_galen_livingner3_es_5.5.0_3.0_1727099436320.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_r_galen_livingner3","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_r_galen_livingner3", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_r_galen_livingner3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|es| +|Size:|1.0 GB| + +## References + +https://huggingface.co/IIC/XLM_R_Galen-livingner3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_r_galen_livingner3_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_r_galen_livingner3_pipeline_es.md new file mode 100644 index 00000000000000..0171a39f1c9a5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_r_galen_livingner3_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish xlm_r_galen_livingner3_pipeline pipeline XlmRoBertaForSequenceClassification from IIC +author: John Snow Labs +name: xlm_r_galen_livingner3_pipeline +date: 2024-09-23 +tags: [es, open_source, pipeline, onnx] +task: Text Classification +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_r_galen_livingner3_pipeline` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_r_galen_livingner3_pipeline_es_5.5.0_3.0_1727099485645.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_r_galen_livingner3_pipeline_es_5.5.0_3.0_1727099485645.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_r_galen_livingner3_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_r_galen_livingner3_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_r_galen_livingner3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|1.0 GB| + +## References + +https://huggingface.co/IIC/XLM_R_Galen-livingner3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_final_mixed_train_2_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_final_mixed_train_2_en.md new file mode 100644 index 00000000000000..bb05dcd41c03eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_final_mixed_train_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_final_mixed_train_2 XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_final_mixed_train_2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_final_mixed_train_2` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_train_2_en_5.5.0_3.0_1727099667830.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_train_2_en_5.5.0_3.0_1727099667830.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_final_mixed_train_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_final_mixed_train_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_final_mixed_train_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|795.0 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Final_Mixed-train-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_final_mixed_train_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_final_mixed_train_2_pipeline_en.md new file mode 100644 index 00000000000000..01a851a7e0eb3d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_final_mixed_train_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_final_mixed_train_2_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_final_mixed_train_2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_final_mixed_train_2_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_train_2_pipeline_en_5.5.0_3.0_1727099797245.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_train_2_pipeline_en_5.5.0_3.0_1727099797245.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_final_mixed_train_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_final_mixed_train_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_final_mixed_train_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|795.0 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Final_Mixed-train-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_en.md new file mode 100644 index 00000000000000..c1ea13115ebd99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2 XlmRoBertaForSequenceClassification from RogerB +author: John Snow Labs +name: xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_en_5.5.0_3.0_1727099771063.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2_en_5.5.0_3.0_1727099771063.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_sent2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/RogerB/xlm-roberta-base-finetuned-kinyarwanda-kin-sent2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_marc_clp_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_marc_clp_en.md new file mode 100644 index 00000000000000..f00eff15aea086 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_marc_clp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_marc_clp XlmRoBertaForSequenceClassification from clp +author: John Snow Labs +name: xlm_roberta_base_finetuned_marc_clp +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_marc_clp` is a English model originally trained by clp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_clp_en_5.5.0_3.0_1727088208546.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_clp_en_5.5.0_3.0_1727088208546.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_marc_clp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_marc_clp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_marc_clp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|835.1 MB| + +## References + +https://huggingface.co/clp/xlm-roberta-base-finetuned-marc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_marc_clp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_marc_clp_pipeline_en.md new file mode 100644 index 00000000000000..c871838f7ef236 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_marc_clp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_marc_clp_pipeline pipeline XlmRoBertaForSequenceClassification from clp +author: John Snow Labs +name: xlm_roberta_base_finetuned_marc_clp_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_marc_clp_pipeline` is a English model originally trained by clp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_clp_pipeline_en_5.5.0_3.0_1727088289384.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_clp_pipeline_en_5.5.0_3.0_1727088289384.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_marc_clp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_marc_clp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_marc_clp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|835.1 MB| + +## References + +https://huggingface.co/clp/xlm-roberta-base-finetuned-marc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_all_jx7789_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_all_jx7789_pipeline_en.md new file mode 100644 index 00000000000000..d24e41e094ad29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_all_jx7789_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_jx7789_pipeline pipeline XlmRoBertaForTokenClassification from jx7789 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_jx7789_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_jx7789_pipeline` is a English model originally trained by jx7789. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_jx7789_pipeline_en_5.5.0_3.0_1727061706795.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_jx7789_pipeline_en_5.5.0_3.0_1727061706795.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_jx7789_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_jx7789_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_jx7789_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/jx7789/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_lee_soha_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_lee_soha_en.md new file mode 100644 index 00000000000000..7b5d02aed12ce4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_lee_soha_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_lee_soha XlmRoBertaForTokenClassification from Lee-soha +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_lee_soha +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_lee_soha` is a English model originally trained by Lee-soha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_lee_soha_en_5.5.0_3.0_1727062311153.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_lee_soha_en_5.5.0_3.0_1727062311153.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_lee_soha","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_lee_soha", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_lee_soha| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/Lee-soha/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_lee_soha_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_lee_soha_pipeline_en.md new file mode 100644 index 00000000000000..4bfc4638d2f7ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_lee_soha_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_lee_soha_pipeline pipeline XlmRoBertaForTokenClassification from Lee-soha +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_lee_soha_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_lee_soha_pipeline` is a English model originally trained by Lee-soha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_lee_soha_pipeline_en_5.5.0_3.0_1727062395510.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_lee_soha_pipeline_en_5.5.0_3.0_1727062395510.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_lee_soha_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_lee_soha_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_lee_soha_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/Lee-soha/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_mjqing_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_mjqing_en.md new file mode 100644 index 00000000000000..3f0643eda98851 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_mjqing_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_mjqing XlmRoBertaForTokenClassification from MJQing +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_mjqing +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_mjqing` is a English model originally trained by MJQing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_mjqing_en_5.5.0_3.0_1727061524790.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_mjqing_en_5.5.0_3.0_1727061524790.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_mjqing","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_mjqing", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_mjqing| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/MJQing/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_mjqing_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_mjqing_pipeline_en.md new file mode 100644 index 00000000000000..1f9280c19a32f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_mjqing_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_mjqing_pipeline pipeline XlmRoBertaForTokenClassification from MJQing +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_mjqing_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_mjqing_pipeline` is a English model originally trained by MJQing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_mjqing_pipeline_en_5.5.0_3.0_1727061612868.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_mjqing_pipeline_en_5.5.0_3.0_1727061612868.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_mjqing_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_mjqing_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_mjqing_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/MJQing/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_sh_zheng_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_sh_zheng_en.md new file mode 100644 index 00000000000000..942ebdc317b629 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_sh_zheng_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_sh_zheng XlmRoBertaForTokenClassification from sh-zheng +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_sh_zheng +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_sh_zheng` is a English model originally trained by sh-zheng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_sh_zheng_en_5.5.0_3.0_1727061887846.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_sh_zheng_en_5.5.0_3.0_1727061887846.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_sh_zheng","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_sh_zheng", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_sh_zheng| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/sh-zheng/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_sh_zheng_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_sh_zheng_pipeline_en.md new file mode 100644 index 00000000000000..8572ba7a93a9e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_sh_zheng_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_sh_zheng_pipeline pipeline XlmRoBertaForTokenClassification from sh-zheng +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_sh_zheng_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_sh_zheng_pipeline` is a English model originally trained by sh-zheng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_sh_zheng_pipeline_en_5.5.0_3.0_1727061975805.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_sh_zheng_pipeline_en_5.5.0_3.0_1727061975805.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_sh_zheng_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_sh_zheng_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_sh_zheng_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/sh-zheng/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_jiogenes_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_jiogenes_en.md new file mode 100644 index 00000000000000..cf16ede2f0a9d0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_jiogenes_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_jiogenes XlmRoBertaForTokenClassification from jiogenes +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_jiogenes +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_jiogenes` is a English model originally trained by jiogenes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jiogenes_en_5.5.0_3.0_1727061753198.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jiogenes_en_5.5.0_3.0_1727061753198.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_jiogenes","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_jiogenes", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_jiogenes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|839.7 MB| + +## References + +https://huggingface.co/jiogenes/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_penguinman73_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_penguinman73_pipeline_en.md new file mode 100644 index 00000000000000..df240949465c3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_penguinman73_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_penguinman73_pipeline pipeline XlmRoBertaForTokenClassification from penguinman73 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_penguinman73_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_penguinman73_pipeline` is a English model originally trained by penguinman73. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_penguinman73_pipeline_en_5.5.0_3.0_1727062120759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_penguinman73_pipeline_en_5.5.0_3.0_1727062120759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_penguinman73_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_penguinman73_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_penguinman73_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.0 MB| + +## References + +https://huggingface.co/penguinman73/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_agvelu_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_agvelu_en.md new file mode 100644 index 00000000000000..1b998610d0544e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_italian_agvelu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_agvelu XlmRoBertaForTokenClassification from agvelu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_agvelu +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_agvelu` is a English model originally trained by agvelu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_agvelu_en_5.5.0_3.0_1727061995703.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_agvelu_en_5.5.0_3.0_1727061995703.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_agvelu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_agvelu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_agvelu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/agvelu/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_hate_speech_ben_hin_pipeline_bn.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_hate_speech_ben_hin_pipeline_bn.md new file mode 100644 index 00000000000000..1f668c4c57d2e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_hate_speech_ben_hin_pipeline_bn.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Bengali xlm_roberta_base_hate_speech_ben_hin_pipeline pipeline XlmRoBertaForSequenceClassification from kingshukroy +author: John Snow Labs +name: xlm_roberta_base_hate_speech_ben_hin_pipeline +date: 2024-09-23 +tags: [bn, open_source, pipeline, onnx] +task: Text Classification +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_hate_speech_ben_hin_pipeline` is a Bengali model originally trained by kingshukroy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_hate_speech_ben_hin_pipeline_bn_5.5.0_3.0_1727089444477.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_hate_speech_ben_hin_pipeline_bn_5.5.0_3.0_1727089444477.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_hate_speech_ben_hin_pipeline", lang = "bn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_hate_speech_ben_hin_pipeline", lang = "bn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_hate_speech_ben_hin_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bn| +|Size:|791.2 MB| + +## References + +https://huggingface.co/kingshukroy/xlm-roberta-base-hate-speech-ben-hin + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_language_detection_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_language_detection_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..1ec17258b44315 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_language_detection_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_language_detection_finetuned_pipeline pipeline XlmRoBertaForSequenceClassification from RonTon05 +author: John Snow Labs +name: xlm_roberta_base_language_detection_finetuned_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_language_detection_finetuned_pipeline` is a English model originally trained by RonTon05. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_language_detection_finetuned_pipeline_en_5.5.0_3.0_1727088824556.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_language_detection_finetuned_pipeline_en_5.5.0_3.0_1727088824556.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_language_detection_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_language_detection_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_language_detection_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|890.4 MB| + +## References + +https://huggingface.co/RonTon05/xlm-roberta-base-language-detection-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_seed42_amh_esp_eng_train_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_seed42_amh_esp_eng_train_en.md new file mode 100644 index 00000000000000..5839ea1fd16215 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_seed42_amh_esp_eng_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_seed42_amh_esp_eng_train XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_seed42_amh_esp_eng_train +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_seed42_amh_esp_eng_train` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_seed42_amh_esp_eng_train_en_5.5.0_3.0_1727099233198.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_seed42_amh_esp_eng_train_en_5.5.0_3.0_1727099233198.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_seed42_amh_esp_eng_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_seed42_amh_esp_eng_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_seed42_amh_esp_eng_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|810.2 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_seed42_amh-esp-eng_train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_seed42_amh_esp_eng_train_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_seed42_amh_esp_eng_train_pipeline_en.md new file mode 100644 index 00000000000000..6d35e264979520 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_seed42_amh_esp_eng_train_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_seed42_amh_esp_eng_train_pipeline pipeline XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_seed42_amh_esp_eng_train_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_seed42_amh_esp_eng_train_pipeline` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_seed42_amh_esp_eng_train_pipeline_en_5.5.0_3.0_1727099363649.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_seed42_amh_esp_eng_train_pipeline_en_5.5.0_3.0_1727099363649.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_seed42_amh_esp_eng_train_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_seed42_amh_esp_eng_train_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_seed42_amh_esp_eng_train_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|810.2 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_seed42_amh-esp-eng_train + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_text_classification_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_text_classification_en.md new file mode 100644 index 00000000000000..f08a63f29aada7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_text_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_text_classification XlmRoBertaForSequenceClassification from CeroShrijver +author: John Snow Labs +name: xlm_roberta_base_text_classification +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_text_classification` is a English model originally trained by CeroShrijver. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_text_classification_en_5.5.0_3.0_1727088644804.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_text_classification_en_5.5.0_3.0_1727088644804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_text_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_text_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_text_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|838.1 MB| + +## References + +https://huggingface.co/CeroShrijver/xlm-roberta-base-text-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_text_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_text_classification_pipeline_en.md new file mode 100644 index 00000000000000..2bc84da58c1ccf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_text_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_text_classification_pipeline pipeline XlmRoBertaForSequenceClassification from CeroShrijver +author: John Snow Labs +name: xlm_roberta_base_text_classification_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_text_classification_pipeline` is a English model originally trained by CeroShrijver. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_text_classification_pipeline_en_5.5.0_3.0_1727088729153.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_text_classification_pipeline_en_5.5.0_3.0_1727088729153.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_text_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_text_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_text_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|838.1 MB| + +## References + +https://huggingface.co/CeroShrijver/xlm-roberta-base-text-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_english_tweet_sentiment_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_english_tweet_sentiment_english_pipeline_en.md new file mode 100644 index 00000000000000..6d6d5465f86b92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_english_tweet_sentiment_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_english_tweet_sentiment_english_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_english_tweet_sentiment_english_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_english_tweet_sentiment_english_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_tweet_sentiment_english_pipeline_en_5.5.0_3.0_1727088654178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_tweet_sentiment_english_pipeline_en_5.5.0_3.0_1727088654178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_trimmed_english_tweet_sentiment_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_trimmed_english_tweet_sentiment_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_english_tweet_sentiment_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|647.0 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-en-tweet-sentiment-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_en.md new file mode 100644 index 00000000000000..f946bf281507eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_italian_tweet_sentiment_italian XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_italian_tweet_sentiment_italian +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_italian_tweet_sentiment_italian` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_en_5.5.0_3.0_1727099902553.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_en_5.5.0_3.0_1727099902553.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_italian_tweet_sentiment_italian","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_italian_tweet_sentiment_italian", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_italian_tweet_sentiment_italian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|457.4 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-it-tweet-sentiment-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_tweet_sentiment_french_trimmed_french_10000_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_tweet_sentiment_french_trimmed_french_10000_en.md new file mode 100644 index 00000000000000..0f73d9e1ef2359 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_tweet_sentiment_french_trimmed_french_10000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_tweet_sentiment_french_trimmed_french_10000 XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_tweet_sentiment_french_trimmed_french_10000 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_tweet_sentiment_french_trimmed_french_10000` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_tweet_sentiment_french_trimmed_french_10000_en_5.5.0_3.0_1727099650809.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_tweet_sentiment_french_trimmed_french_10000_en_5.5.0_3.0_1727099650809.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_tweet_sentiment_french_trimmed_french_10000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_tweet_sentiment_french_trimmed_french_10000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_tweet_sentiment_french_trimmed_french_10000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|349.5 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-tweet-sentiment-fr-trimmed-fr-10000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlmr_estonian_english_all_shuffled_2020_test1000_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlmr_estonian_english_all_shuffled_2020_test1000_en.md new file mode 100644 index 00000000000000..4446d63beeaf57 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlmr_estonian_english_all_shuffled_2020_test1000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_estonian_english_all_shuffled_2020_test1000 XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_estonian_english_all_shuffled_2020_test1000 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_estonian_english_all_shuffled_2020_test1000` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_estonian_english_all_shuffled_2020_test1000_en_5.5.0_3.0_1727099510614.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_estonian_english_all_shuffled_2020_test1000_en_5.5.0_3.0_1727099510614.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_estonian_english_all_shuffled_2020_test1000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_estonian_english_all_shuffled_2020_test1000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_estonian_english_all_shuffled_2020_test1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|819.1 MB| + +## References + +https://huggingface.co/patpizio/xlmr-et-en-all_shuffled-2020-test1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlmr_estonian_english_all_shuffled_2020_test1000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlmr_estonian_english_all_shuffled_2020_test1000_pipeline_en.md new file mode 100644 index 00000000000000..8fd3a51adaa510 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlmr_estonian_english_all_shuffled_2020_test1000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_estonian_english_all_shuffled_2020_test1000_pipeline pipeline XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_estonian_english_all_shuffled_2020_test1000_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_estonian_english_all_shuffled_2020_test1000_pipeline` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_estonian_english_all_shuffled_2020_test1000_pipeline_en_5.5.0_3.0_1727099627897.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_estonian_english_all_shuffled_2020_test1000_pipeline_en_5.5.0_3.0_1727099627897.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_estonian_english_all_shuffled_2020_test1000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_estonian_english_all_shuffled_2020_test1000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_estonian_english_all_shuffled_2020_test1000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|819.1 MB| + +## References + +https://huggingface.co/patpizio/xlmr-et-en-all_shuffled-2020-test1000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlmr_romanian_english_all_shuffled_764_test1000_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlmr_romanian_english_all_shuffled_764_test1000_en.md new file mode 100644 index 00000000000000..f6818a7086d018 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlmr_romanian_english_all_shuffled_764_test1000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_romanian_english_all_shuffled_764_test1000 XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_romanian_english_all_shuffled_764_test1000 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_romanian_english_all_shuffled_764_test1000` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_romanian_english_all_shuffled_764_test1000_en_5.5.0_3.0_1727099124966.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_romanian_english_all_shuffled_764_test1000_en_5.5.0_3.0_1727099124966.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_romanian_english_all_shuffled_764_test1000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_romanian_english_all_shuffled_764_test1000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_romanian_english_all_shuffled_764_test1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|820.4 MB| + +## References + +https://huggingface.co/patpizio/xlmr-ro-en-all_shuffled-764-test1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlmr_romanian_english_all_shuffled_764_test1000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlmr_romanian_english_all_shuffled_764_test1000_pipeline_en.md new file mode 100644 index 00000000000000..76d961336dbea0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlmr_romanian_english_all_shuffled_764_test1000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_romanian_english_all_shuffled_764_test1000_pipeline pipeline XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_romanian_english_all_shuffled_764_test1000_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_romanian_english_all_shuffled_764_test1000_pipeline` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_romanian_english_all_shuffled_764_test1000_pipeline_en_5.5.0_3.0_1727099236724.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_romanian_english_all_shuffled_764_test1000_pipeline_en_5.5.0_3.0_1727099236724.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_romanian_english_all_shuffled_764_test1000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_romanian_english_all_shuffled_764_test1000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_romanian_english_all_shuffled_764_test1000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|820.4 MB| + +## References + +https://huggingface.co/patpizio/xlmr-ro-en-all_shuffled-764-test1000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlmrobertabaseforpawsx_english_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlmrobertabaseforpawsx_english_en.md new file mode 100644 index 00000000000000..976c70f563c132 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlmrobertabaseforpawsx_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmrobertabaseforpawsx_english XlmRoBertaForSequenceClassification from ziqingyang +author: John Snow Labs +name: xlmrobertabaseforpawsx_english +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmrobertabaseforpawsx_english` is a English model originally trained by ziqingyang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmrobertabaseforpawsx_english_en_5.5.0_3.0_1727088286342.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmrobertabaseforpawsx_english_en_5.5.0_3.0_1727088286342.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmrobertabaseforpawsx_english","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmrobertabaseforpawsx_english", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmrobertabaseforpawsx_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|860.0 MB| + +## References + +https://huggingface.co/ziqingyang/XLMRobertaBaseForPAWSX-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlmrobertabaseforpawsx_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlmrobertabaseforpawsx_english_pipeline_en.md new file mode 100644 index 00000000000000..39257900f36c04 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlmrobertabaseforpawsx_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmrobertabaseforpawsx_english_pipeline pipeline XlmRoBertaForSequenceClassification from ziqingyang +author: John Snow Labs +name: xlmrobertabaseforpawsx_english_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmrobertabaseforpawsx_english_pipeline` is a English model originally trained by ziqingyang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmrobertabaseforpawsx_english_pipeline_en_5.5.0_3.0_1727088356190.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmrobertabaseforpawsx_english_pipeline_en_5.5.0_3.0_1727088356190.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmrobertabaseforpawsx_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmrobertabaseforpawsx_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmrobertabaseforpawsx_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|860.1 MB| + +## References + +https://huggingface.co/ziqingyang/XLMRobertaBaseForPAWSX-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file